Skip to main content

Fast file-based format for geometries with Geopandas

Project description

geofeather

Build Status Coverage Status

A faster file-based format for geometries with geopandas.

This project capitalizes on the very fast feather file format to store geometry (points, lines, polygons) data for interoperability with geopandas.

Introductory post.

Why does this exist?

This project exists because reading and writing standard spatial formats (e.g., shapefile) in geopandas is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.

In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via .to_file() on a GeoDataFrame.

We see about 2x faster reads compared to geopandas read_file() function.

How does it work?

The feather format works brilliantly for standard pandas data frames. In order to leverage the feather format, we simply convert the geometry data from shapely objects into Well Known Binary (WKB) format, and then store that column as raw bytes.

We store the coordinate reference system using JSON format in a sidecar file .crs.

Installation

Available on PyPi at: https://pypi-hypernode.com/project/geofeather/

pip install geofeather

Usage

Write

Given an existing GeoDataFrame my_gdf, pass this into to_geofeather:

to_geofeather(my_gdf, 'test.feather')

Read

my_gdf = from_geofeather('test.feather')

TEMPORARY

pygeos provides much faster operations of geospatial operations over arrays of geospatial data.

geopandas is in the process of migrating to using pygeos geometries as its internal data storage instead of shapely objects.

Until pygeos is fully integrated, there are shims in geofeather to support interoperability with pandas DataFrames containing pygeos geometries. If you are already using pygeos against data you read from geofeather, using the following shims will generate 3-7x speedups reading and writing data compared to geofeather reading into GeoDataFrames.

Internally, the feather file is identical to the one created above.

pygeos is required in order to use this functionality.

WARNING: this will be deprecated as soon as pygeos is integrated into geopandas.

from geofeather.pygeos import to_geofeather, from_geofeather

# given a DataFrame df containing pygeos geometries in 'geometry' column
# and a crs object

to_geofeather(df, 'test.feather', crs=crs)

df = from_geofeather('test.geofeather')

Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above from_geofeather

Indexes

Right now, indexes are not supported in feather files. In order to get around this, simply reset your index before calling to_geofeather.

Changes

0.3.0

  • allow serializing to / from pandas DataFrames containing pygeos geometries (see notes above).
  • use new CRS object in geopandas data frames (#4)
  • dropped to_shp; use geopandas to_file() instead.

0.2.0

  • allow reading a subset of columns from a feather file
  • store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)

0.1.0

  • Initial release

Credits

Everything that makes this fast is due to the hard work of contributors to pyarrow, geopandas, and shapely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geofeather-0.3.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

geofeather-0.3.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file geofeather-0.3.0.tar.gz.

File metadata

  • Download URL: geofeather-0.3.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for geofeather-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5889ebc31c02dd38215884badb3fa20029628088dbe3d5936894d0488ec01fa4
MD5 b42e2a04a440b4f77e3c927733ce0e20
BLAKE2b-256 b624c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab

See more details on using hashes here.

File details

Details for the file geofeather-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: geofeather-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for geofeather-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 132a79ef3b31d53fe13287eba31b057d33ced233bf5ed01af25856ec6c60e5ff
MD5 6d4c5f61bdf4068842c224c2a63f3a20
BLAKE2b-256 6c1e8a0a3b25b2fff01ab834bdc73794e83de1353f73f8ec481a60bcb5b71b00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page