Fast file-based format for geometries with Geopandas
Project description
geofeather
A faster file-based format for geometries with geopandas
.
This project capitalizes on the very fast feather
file format to store geometry (points, lines, polygons) data for interoperability with geopandas
.
Why does this exist?
This project exists because reading and writing standard spatial formats (e.g., shapefile) in geopandas
is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.
In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via .to_file()
on a GeoDataFrame
.
We see about 2x faster reads compared to geopandas read_file()
function.
How does it work?
The feather
format works brilliantly for standard pandas
data frames. In order to leverage the feather
format, we simply convert the geometry data from shapely
objects into Well Known Binary (WKB) format, and then store that column as raw bytes.
We store the coordinate reference system using JSON format in a sidecar file .crs
.
Installation
Available on PyPi at: https://pypi-hypernode.com/project/geofeather/
pip install geofeather
Usage
Write
Given an existing GeoDataFrame
my_gdf
, pass this into to_geofeather
:
to_geofeather(my_gdf, 'test.feather')
Read
my_gdf = from_geofeather('test.feather')
TEMPORARY
pygeos
provides much faster operations of geospatial operations over arrays of geospatial data.
geopandas
is in the process of migrating to using pygeos
geometries as its internal data storage instead of shapely
objects.
Until pygeos
is fully integrated, there are shims in geofeather
to support interoperability with pandas DataFrames containing pygeos
geometries. If you are already using pygeos
against data you read from geofeather
, using the following shims will generate 3-7x speedups reading and writing data compared to geofeather
reading into GeoDataFrames.
Internally, the feather file is identical to the one created above.
pygeos
is required in order to use this functionality.
WARNING: this will be deprecated as soon as pygeos
is integrated into geopandas
.
from geofeather.pygeos import to_geofeather, from_geofeather
# given a DataFrame df containing pygeos geometries in 'geometry' column
# and a crs object
to_geofeather(df, 'test.feather', crs=crs)
df = from_geofeather('test.geofeather')
Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above from_geofeather
Indexes
Right now, indexes are not supported in feather
files. In order to get around this, simply reset your index before calling to_geofeather
.
Changes
0.3.0
- allow serializing to / from pandas DataFrames containing
pygeos
geometries (see notes above). - use new CRS object in geopandas data frames (#4)
- dropped
to_shp
; use geopandasto_file()
instead.
0.2.0
- allow reading a subset of columns from a feather file
- store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)
0.1.0
- Initial release
Credits
Everything that makes this fast is due to the hard work of contributors to pyarrow
, geopandas
, and shapely
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geofeather-0.3.0.tar.gz
.
File metadata
- Download URL: geofeather-0.3.0.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5889ebc31c02dd38215884badb3fa20029628088dbe3d5936894d0488ec01fa4 |
|
MD5 | b42e2a04a440b4f77e3c927733ce0e20 |
|
BLAKE2b-256 | b624c9c9b285d79e18c098bc8c8140eb3b4b000753aecfc26a6daa82b1bb6dab |
File details
Details for the file geofeather-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: geofeather-0.3.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 132a79ef3b31d53fe13287eba31b057d33ced233bf5ed01af25856ec6c60e5ff |
|
MD5 | 6d4c5f61bdf4068842c224c2a63f3a20 |
|
BLAKE2b-256 | 6c1e8a0a3b25b2fff01ab834bdc73794e83de1353f73f8ec481a60bcb5b71b00 |