Skip to main content

GeoPandas objects backed with Dask

Project description

Parallel GeoPandas with Dask

Status

EXPERIMENTAL This project is in an early state. The basic element-wise spatial methods are implemented, but also not yet much more than that.

If you would like to see this project in a more stable state, then you might consider pitching in with developer time (contributions are very welcome!) or with financial support from you or your company.

This is a new project that builds off the exploration done in https://github.com/mrocklin/dask-geopandas

Example

Given a GeoPandas dataframe

import geopandas
df = geopandas.read_file('...')

We can repartition it into a Dask-GeoPandas dataframe:

import dask_geopandas
ddf = dask_geopandas.from_geopandas(df, npartitions=4)

Currently, this repartitions the data naively by rows. In the future, this will also provide spatial partitioning to take advantage of the spatial structure of the GeoDataFrame (but the current version still provides basic multi-core parallelism).

The familiar spatial attributes and methods of GeoPandas are also available and will be computed in parallel:

ddf.geometry.area.compute()
ddf.within(polygon)

Additionally, if you have a distributed dask.dataframe you can pass columns of x-y points to the set_geometry method. Currently, this only supports point data.

import dask.dataframe as dd
import dask_geopandas

ddf = dd.read_csv('...')

ddf = dask_geopandas.from_dask_dataframe(ddf)
ddf = dff.set_geometry(
    dask_geopandas.points_from_xy(ddf, 'latitude', 'longitude')
)

Writing files (and reading back) is currently supported for the Parquet file format:

ddf.to_parquet("path/to/dir/")
ddf = dask_geopandas.read_parquet("path/to/dir/")

Installation

This package depends on GeoPandas and Dask. In addition, it is recommended to install PyGEOS, to have faster spatial operations and enable multithreading. See https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for details.

One way is to use the conda package manager to create a new environment:

conda create -n geo_env
conda activate geo_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
conda install python=3 geopandas dask pygeos
pip install git+git://github.com/geopandas/dask-geopandas.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-geopandas-0.1.0a2.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

dask_geopandas-0.1.0a2-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file dask-geopandas-0.1.0a2.tar.gz.

File metadata

  • Download URL: dask-geopandas-0.1.0a2.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for dask-geopandas-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 c246ee40ed68e6772ed4e16551c09ddd262ef30b07a8b9ae90686e6ebf94fa34
MD5 8b0531cc7a8bccfc7360e871f9f65e2f
BLAKE2b-256 521c9cf46a96be5fd15cba0aeb5abc699c9753f147cce070cd9d558a4d49ecbd

See more details on using hashes here.

File details

Details for the file dask_geopandas-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: dask_geopandas-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for dask_geopandas-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 10a17e947f34653b31139e9a04b60df7e1d8b73a83a602493288b5a583951cdd
MD5 31eaabf3ab89dc213c16256c64e4f930
BLAKE2b-256 3a412b17b3f3a0b55d6a6a6d9ba38de743f0701dc038eff44c183578b1c55a69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page