Skip to main content

Summarize geospatial raster datasets based on vector geometries

Project description

BuildStatus CoverageStatus

The rasterstats python module provides a fast and flexible tool to summarize geospatial raster datasets based on vector geometries (i.e. zonal statistics).

  • Raster data support

    • Any raster data source supported by rasterio and GDAL

    • Support for continuous and categorical

    • Respects null/no-data metadata or takes argument

  • Vector data support

    • Points, Lines, Polygon and Multi-* geometries

    • Flexible input formats

      • Any vector data source supported by fiona

      • Python objects that are GeoJSON-like mappings or support the geo_interface

      • Well-Known Text/Binary (WKT/WKB) geometries

  • Depends on libgdal, rasterio, fiona, shapely and numpy

Install

Using Ubuntu 14.04:

sudo apt-get install python-numpy libgdal1h gdal-bin libgdal-dev
pip install rasterstats

Or homebrew on OS X:

brew install gdal
pip install rasterstats

For Windows, follow the rasterio installation and then run:

pip install rasterstats

Example Usage - Python

Given a polygon vector layer and a digitial elevation model (DEM) raster, calculate the mean elevation of each polygon:

zones elevation
>>> from rasterstats import zonal_stats
>>> stats = zonal_stats("tests/data/polygons.shp", "tests/data/elevation.tif")

>>> stats[1].keys()
    ['count', 'min', 'max', 'mean']

>>> [f['mean'] for f in stats]
    [756.6057470703125, 114.660084635416666]

Example Usage - Command Line

rasterstats includes a rasterio plugin for performing zonal statistics at the command line. Given a raster and geojson input, the rio zonalstats command will summarize the raster cell values for all features and output to geojson. In the resulting GeoJSON FeatureCollection, each Feature will have additional properties containing summary statistics of the overlapping raster cells.

For example, you could summarize the elevation (srtm5k.tif) by country (countries.json) using the following command:

$ rio zonalstats -r srtm5k.tif countries.json countries_with_elevation.geojson

Or use stdin/stdout to pipe geojson data between processes

$ ogr2ogr -f GeoJSON /vsistdout/ countries.shp | rio zonalstats -r srtm5k.tif | geojsonio

For more comprehensive documentation of the command line interface, see the docs.

Statistics

By default, the zonal_stats function will return the following statistics

  • min

  • max

  • mean

  • count

Optionally, these statistics are also available

  • sum

  • std

  • median

  • majority

  • minority

  • unique

  • range

  • nodata

  • percentile (see note below for details)

You can specify the statistics to calculate using the stats argument:

>>> stats = zonal_stats("tests/data/polygons.shp",
                         "tests/data/elevation.tif",
                         stats=['min', 'max', 'median', 'majority', 'sum'])

>>> # also takes space-delimited string
>>> stats = zonal_stats("tests/data/polygons.shp",
                         "tests/data/elevation.tif",
                         stats="min max median majority sum")

Note that certain statistics (majority, minority, and unique) require significantly more processing due to expensive counting of unique occurences for each pixel value.

You can also use a percentile statistic by specifying percentile_<q> where <q> can be a floating point number between 0 and 100.

User-defined Statistics

You can define your own aggregate functions using the add_stats argument. This is a dictionary with the name(s) of your statistic as keys and the function(s) as values. For example, to reimplement the mean statistic:

from __future__ import division
import numpy as np

def mymean(x):
    return np.ma.mean(x)

then use it in your zonal_stats call like so:

stats = zonal_stats(vector, raster, add_stats={'mymean':mymean})

Specifying Geometries

In addition to the basic usage above, rasterstats supports other mechanisms of specifying vector geometries.

It integrates with other python objects that support the geo_interface (e.g. Fiona, Shapely, ArcPy, PyShp, GeoDjango):

>>> import fiona

>>> # an iterable of objects with geo_interface
>>> lyr = fiona.open('/path/to/vector.shp')
>>> features = (x for x in lyr if x['properties']['state'] == 'CT')
>>> zonal_stats(features, '/path/to/elevation.tif')
...

>>> # a single object with a geo_interface
>>> lyr = fiona.open('/path/to/vector.shp')
>>> zonal_stats(lyr.next(), '/path/to/elevation.tif')
...

Or by using with geometries in “Well-Known” formats:

>>> zonal_stats('POINT(-124 42)', '/path/to/elevation.tif')
...

Retaining Feature info with GeoJSON output

By default, zonal_stats returns a list of dictionaries containing the statistics. If you want to retain all of the feature (geometry, id, properties, etc) and just append the stats as additional properties, you can set geojson_out to True:

>>> stats = zonal_stats("tests/data/polygons.shp",
                         "tests/data/elevation.tif"
                         geojson_out=True, prefix="elevation", stats="mean")

>>> stats[0]['properties']['elevation_mean']
756.6057470703125

Rasterization Strategy

There is no right or wrong way to rasterize a vector. The default strategy is to include all pixels along the line render path (for lines), or cells where the center point is within the polygon (for polygons). Alternatively, you can opt for the all_touched strategy which rasterizes the geometry by including all pixels that it touches. You can enable this specifying:

>>> zonal_stats(..., all_touched=True)
rasterization

The figure above illustrates the difference; the default all_touched=False is on the left while the all_touched=True option is on the right. Both approaches are valid and there are tradeoffs to consider. Using the default rasterizer may miss polygons that are smaller than your cell size resulting in None stats for those geometries. Using the all_touched strategy includes many cells along the edges that may not be representative of the geometry and may give severly biased results in some cases.

Working with categorical rasters

You can treat rasters as categorical (i.e. raster values represent discrete classes) if you’re only interested in the counts of unique pixel values.

For example, you may have a raster vegetation dataset and want to summarize vegetation by polygon. Statistics such as mean, median, sum, etc. don’t make much sense in this context (What’s the sum of oak + grassland?).

The polygon below is comprised of 12 pixels of oak (raster value 32) and 78 pixels of grassland (raster value 33):

>>> zonal_stats(lyr.next(), '/path/to/vegetation.tif', categorical=True)
[{32: 12, 33: 78}]

rasterstats will report using the pixel values as keys. To associate the pixel values with their appropriate meaning (for example oak instead of 32), you can use a category_map:

>>> cmap = {32: 'oak', 33: 'grassland'}
>>> zonal_stats(lyr.next(), '/path/to/vegetation.tif',
                categorical=True, category_map=cmap)
[{'oak': 12, 'grassland': 78}]

“Mini-Rasters”

Internally, we create a masked raster dataset for each feature in order to calculate statistics. Optionally, we can include these data in the output of zonal_stats using the raster_out argument:

stats = zonal_stats(vector, raster, raster_out=True)

Which gives us three additional keys for each feature:

mini_raster_array: The clipped and masked numpy array
mini_raster_affine: Affine transform (not a GDAL-style geotransform)
mini_raster_nodata: nodata Value

Keep in mind that having ndarrays in your stats dictionary means it is more difficult to serialize to json and other text formats.

Issues

Find a bug? Report it via github issues by providing

  • a link to download the smallest possible raster and vector dataset necessary to reproduce the error

  • python code or command to reproduce the error

  • information on your environment: versions of python, gdal and numpy and system memory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

rasterstats-0.9.2.zip (27.5 kB view details)

Uploaded Source

rasterstats-0.9.2.tar.gz (19.5 kB view details)

Uploaded Source

File details

Details for the file rasterstats-0.9.2.zip.

File metadata

  • Download URL: rasterstats-0.9.2.zip
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for rasterstats-0.9.2.zip
Algorithm Hash digest
SHA256 ceb9303403a83366d730592ba76adf3582fd657cc353607ae33ab2368350c471
MD5 0eab604d1a867f64242a904965fac932
BLAKE2b-256 b4af14f830d4abc2844d9261e99d011a90ea069a0aea5c16daadef057910d020

See more details on using hashes here.

File details

Details for the file rasterstats-0.9.2.tar.gz.

File metadata

  • Download URL: rasterstats-0.9.2.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for rasterstats-0.9.2.tar.gz
Algorithm Hash digest
SHA256 de41a7fac74375b94b8bbd0a30f0dfa33b9a925d0a4e706d6c432e5906a65d81
MD5 2eb1ff0e0f6b23ff3ff4daf7d794226f
BLAKE2b-256 5f8721137bafbaead3ebb101736b5e754f379476df46b45ecc9f5450b0a9f4ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page