Skip to main content

CLI DGGS indexer for vector geospatial data

Project description

vector2dggs

Python-based CLI tool to index raster files to DGGS in parallel, writing out to Parquet.

This is the vector equivalent of raster2dggs.

Currently only supports H3 DGGS, and probably has other limitations since it has been developed for a specific internal use case, though it is intended as a general-purpose abstraction. Contributions, suggestions, bug reports and strongly worded letters are all welcome.

Currently only supports polygons.

Example use case for vector2dggs, showing parcels indexed to a high H3 resolution

Usage

vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY

  Ingest a vector dataset and index it to the H3 DGGS.

  VECTOR_INPUT is the path to input vector geospatial data.
  OUTPUT_DIRECTORY should be a directorty, not a file, as it will be the
  write location for an Apache Parquet data store.

Options:
  -v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO
                                  or DEBUG  [default: INFO]
  -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 resolution to index  [required]
  -id, --id_field TEXT            Field to use as an ID; defaults to a
                                  constructed single 0...n index on the
                                  original feature order.
  -a, --all_attributes            Retain attributes in output. The
                                  default is to create an output that
                                  only includes H3 cell ID and the ID(s)
                                  given by the -id field (or the default
                                  index ID).
  -p, --partitions INTEGER        Geo-partitioning, currently only
                                  available in Hilbert method  [default:
                                  50; required]
  -s, --spatial-sorting [hilbert|morton|geohash]
                                  Spatial sorting method  [default:
                                  hilbert]
  -crs, --cut_crs INTEGER         Set crs(epsg) to input layer (used for
                                  cutting), defaults to input crs
  -c, --cut_threshold INTEGER     Cutting up large polygons into target
                                  length (meters)  [default: 5000;
                                  required]
  -t, --threads INTEGER           Amount of threads used for operation
                                  [default: 7]
  -o, --overwrite
  --help                          Show this message and exit.

Visualising output

Output is in the Apache Parquet format, a directory with one file per partition.

For a quick view of your output, you can read Apache Parquet with pandas, and then use h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:

  • Joining attribute data from the original feature-level data onto computer DGGS cells.
  • Joining other data to this output on the H3 cell ID. (The output has a column like h3_\d{2}, e.g. h3_09 or h3_12 according to the target resolution.)

Geoparquet output (hexagon boundaries):

>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
                  title_no                                           geometry
h3_12                                                                        
8cbb53a734553ff  NA94D/635  POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff  NA94D/635  POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff  NA94D/635  POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff  NA94D/635  POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff  NA94D/635  POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
...                    ...                                                ...
8cbb53a548b2dff  NA62D/324  POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff  NA62D/324  POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff  NA57C/785  POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff  NA57C/785  POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff  NA57C/785  POLYGON ((174.30149 -35.69332, 174.30147 -35.6...

[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')

For development

In brief, to get started:

  • Install Poetry
  • Install GDAL
    • If you're on Windows, pip install gdal may be necessary before running the subsequent commands.
    • On Linux, install GDAL 3.6+ according to your platform-specific instructions, including development headers, i.e. libgdal-dev.
  • Create the virtual environment with poetry init. This will install necessary dependencies.
  • Subsequently, the virtual environment can be re-activated with poetry shell.

If you run poetry install, the CLI tool will be aliased so you can simply use vector2dggs rather than poetry run vector2dggs, which is the alternative if you do not poetry install.

Code formatting

Code style: black

Please run black . before committing.

Example commands

vector2dggs h3 -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet

Citation

@software{vector2dggs,
  title={{vector2dggs}},
  author={Ardo, James and Law, Richard},
  url={https://github.com/manaakiwhenua/vector2dggs},
  version={0.1.0},
  date={2023-04-20}
}

APA/Harvard

Ardo, J., & Law, R. (2023). vector2dggs (0.1.0) [Computer software]. https://github.com/manaakiwhenua/vector2dggs

manaakiwhenua-standards

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector2dggs-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

vector2dggs-0.1.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file vector2dggs-0.1.0.tar.gz.

File metadata

  • Download URL: vector2dggs-0.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.15.0-41-generic

File hashes

Hashes for vector2dggs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d55fa4d781b2859ce2bec130f9476d740638433972dd82b080e68520454d0c08
MD5 71676d295c196da8eeb71ba96287aef8
BLAKE2b-256 0f923b032937ccb4ca0ff6f7f99a0bec8cface6b8b715a856f90ce53616d3d98

See more details on using hashes here.

File details

Details for the file vector2dggs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vector2dggs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.6 Linux/5.15.0-41-generic

File hashes

Hashes for vector2dggs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d619dddac2822685a56dd03d4630e268051bf7e565f561c88eb84f823ffe8a1a
MD5 f2a6c40e7f1ba4131d2756d7dc922ad8
BLAKE2b-256 da6a34394081445ab8f49c439e0e4da9aa07d9d78d5141c25abaceaee01124b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page