GeoPandas objects backed with Dask
Project description
Parallel GeoPandas with Dask
Status
EXPERIMENTAL This project is in an early state. The basic element-wise spatial methods are implemented, but also not yet much more than that.
If you would like to see this project in a more stable state, then you might consider pitching in with developer time (contributions are very welcome!) or with financial support from you or your company.
This is a new project that builds off the exploration done in https://github.com/mrocklin/dask-geopandas
Example
Given a GeoPandas dataframe
import geopandas
df = geopandas.read_file('...')
We can repartition it into a Dask-GeoPandas dataframe:
import dask_geopandas
ddf = dask_geopandas.from_geopandas(df, npartitions=4)
Currently, this repartitions the data naively by rows. In the future, this will also provide spatial partitioning to take advantage of the spatial structure of the GeoDataFrame (but the current version still provides basic multi-core parallelism).
The familiar spatial attributes and methods of GeoPandas are also available and will be computed in parallel:
ddf.geometry.area.compute()
ddf.within(polygon)
Additionally, if you have a distributed dask.dataframe you can pass columns of x-y points to the set_geometry method. Currently, this only supports point data.
import dask.dataframe as dd
import dask_geopandas
ddf = dd.read_csv('...')
ddf = dask_geopandas.from_dask_dataframe(ddf)
ddf = dff.set_geometry(
dask_geopandas.points_from_xy(ddf, 'latitude', 'longitude')
)
Writing files (and reading back) is currently supported for the Parquet file format:
ddf.to_parquet("path/to/dir/")
ddf = dask_geopandas.read_parquet("path/to/dir/")
Installation
This package depends on GeoPandas and Dask. In addition, it is recommended to install PyGEOS, to have faster spatial operations and enable multithreading. See https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for details.
One way is to use the conda package manager to create a new environment:
conda create -n geo_env conda activate geo_env conda config --env --add channels conda-forge conda config --env --set channel_priority strict conda install python=3 geopandas dask pygeos pip install git+git://github.com/geopandas/dask-geopandas.git
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dask-geopandas-0.1.0a4.tar.gz
.
File metadata
- Download URL: dask-geopandas-0.1.0a4.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ddce8fe5f5f4eb104485a412200af17490b0ef067a5409853099b1830820fb4 |
|
MD5 | 70f44953c2d05efe05fa18ea156232c2 |
|
BLAKE2b-256 | ec18b6e8eb51c3ae313955fe6b32f56367fe7ca5f6d070b8d43126a386b38698 |
File details
Details for the file dask_geopandas-0.1.0a4-py3-none-any.whl
.
File metadata
- Download URL: dask_geopandas-0.1.0a4-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0cd99f5c9db467a28e7732b46d7b1f7a2ec463e8da0030f9d0e34b48db0c6d2a |
|
MD5 | 0c2b61eeadcbe19c0890a1d1a58e6850 |
|
BLAKE2b-256 | f4f50536f86eaf9d6272fc728f464e722266258370024b33fb4170fc35a904f4 |