Skip to main content

Fast file-based format for geometries with Geopandas

Project description

geofeather

Build Status Coverage Status

A faster file-based format for geometries with geopandas.

This project capitalizes on the very fast feather file format to store geometry (points, lines, polygons) data for interoperability with geopandas.

Introductory post.

Why does this exist?

This project exists because reading and writing standard spatial formats (e.g., shapefile) in geopandas is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.

In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via .to_file() on a GeoDataFrame.

We see about 2x faster reads compared to geopandas read_file() function.

How does it work?

The feather format works brilliantly for standard pandas data frames. In order to leverage the feather format, we simply convert the geometry data from shapely objects into Well Known Binary (WKB) format, and then store that column as raw bytes.

We store the coordinate reference system using JSON format in a sidecar file .crs.

Installation

Available on PyPi at: https://pypi-hypernode.com/project/geofeather/

pip install geofeather

Usage

Write

Given an existing GeoDataFrame my_gdf, pass this into to_geofeather:

to_geofeather(my_gdf, 'test.feather')

Read

my_gdf = from_geofeather('test.feather')

Indexes

Right now, indexes are not supported in feather files. In order to get around this, simply reset your index before calling to_geofeather.

Changes

0.2.0

  • allow reading a subset of columns from a feather file
  • store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)

0.1.0

  • Initial release

Credits

Everything that makes this fast is due to the hard work of contributors to pyarrow, geopandas, and shapely.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geofeather-0.2.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

geofeather-0.2.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file geofeather-0.2.0.tar.gz.

File metadata

  • Download URL: geofeather-0.2.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for geofeather-0.2.0.tar.gz
Algorithm Hash digest
SHA256 09c71aaa595ea594e8f8813a33e24eaa911e917356c18de3ef72d37c36aa7f52
MD5 b97433533767d998f4b63c40b8974e9c
BLAKE2b-256 7030557f94f2d4c5d80d3e82798259ebf7c16d36c6a12bb0677d9163790ba6dd

See more details on using hashes here.

File details

Details for the file geofeather-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: geofeather-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for geofeather-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e18c317167ed6d45e4509b5e8c270fd7f2aafc70d4a6d28a3cf1e68c47ca8881
MD5 4a7aa98e69a6b5e7577b0ff22ee0d929
BLAKE2b-256 a6e24d4cfa93939550c6a897dc4125f9d7f32783d25a6ed9973b5d68ced20989

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page