phenodata is a data acquisition and manipulation toolkit for open access phenology data
Project description
phenodata - phenology data acquisition for humans
About
phenodata is a data acquisition and manipulation toolkit for open access phenology data. It is written in Python.
Currently, it implements data wrappers for acquiring phenology observation data published on the DWD Climate Data Center (CDC) FTP server operated by »Deutscher Wetterdienst« (DWD).
Under the hood, it uses the fine Pandas data analysis library for data mangling, amongst others.
Acknowledgements
Thanks to the many observers, »Deutscher Wetterdienst«, the »Global Phenological Monitoring programme« and all people working behind the scenes for their commitment in recording the observations and for making the excellent datasets available to the community. You know who you are.
Getting started
Introduction
For most acquisition tasks, you must choose from one of two different datasets: annual-reporters and immediate-reporters.
To improve data acquisition performance, also consider applying the --filename= parameter for file name filtering.
Example: When using --filename=Hasel,Schneegloeckchen, only file names containing Hasel or Schneegloeckchen will be retrieved, thus minimizing the required effort to acquire all files.
Install
If you know your way around Python, installing this software is really easy:
pip install phenodata --upgrade
Please refer to the virtualenv page about further recommendations how to install and use this software.
Usage
$ phenodata --help Usage: phenodata info phenodata list-species --source=dwd [--format=csv] phenodata list-phases --source=dwd [--format=csv] phenodata list-stations --source=dwd --dataset=immediate [--all] [--format=csv] phenodata nearest-station --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954 [--format=csv] phenodata nearest-stations --source=dwd --dataset=immediate [--all] --latitude=52.520007 --longitude=13.404954 [--limit=10] [--format=csv] phenodata list-quality-levels --source=dwd [--format=csv] phenodata list-quality-bytes --source=dwd [--format=csv] phenodata list-filenames --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017] phenodata list-urls --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--year=2017] phenodata (observations|forecast) --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--station-id=164,717] [--species-id=113,127] [--phase-id=5] [--quality-level=10] [--quality-byte=1,2,3] [--year=2017] [--humanize] [--language=german] [--format=csv] phenodata (observations|forecast) --source=dwd --dataset=immediate --partition=recent [--filename=Hasel,Schneegloeckchen] [--station=berlin,brandenburg] [--species=hazel,snowdrop] [--phase=flowering] [--year=2017] [--humanize] [--language=german] [--format=csv] phenodata --version phenodata (-h | --help) Data acquisition options: --source=<source> Data source. Currently "dwd" only. --dataset=<dataset> Data set. Use "immediate" or "annual" for --source=dwd. --partition=<dataset> Partition. Use "recent" or "historical" for --source=dwd. --filename=<file> Filter by file names (comma-separated list) Direct filtering options: --years=<years> Filter by years (comma-separated list) --station-id=<station-id> Filter by station ids (comma-separated list) --species-id=<species-id> Filter by species ids (comma-separated list) --phase-id=<phase-id> Filter by phase ids (comma-separated list) Humanized filtering options: --station=<station> Filter by strings from "stations" data (comma-separated list) --species=<species> Filter by strings from "species" data (comma-separated list) --phase=<phase> Filter by strings from "phases" data (comma-separated list) Data output options: --format=<format> Output data in designated format. Choose one of "tabular", "json", "csv" or "string". With "tabular", it is also possible to specify the table format, see https://bitbucket.org/astanin/python-tabulate. e.g. "tabular:presto". [default: tabular:psql] --humanize Resolve ID-based columns to real names with "observations" and "forecast" output. --language=<language> Use labels in designated language when using ``--humanize`` [default: english]. --limit=<limit> Limit output of "nearest-stations" to designated number of entries. [default: 10]
Examples
Metadata
List of species:
phenodata list-species --source=dwd
List of phases:
phenodata list-phases --source=dwd
List of stations:
phenodata list-stations --source=dwd --dataset=immediate
List of file names of recent observations by the annual reporters:
phenodata list-filenames --source=dwd --dataset=annual --partition=recent
List of full URLs to observations using filename-based filtering:
phenodata list-urls --source=dwd --dataset=annual --partition=recent --filename=Hasel,Schneegloeckchen
Display nearest station for given position:
phenodata nearest-station --source=dwd --dataset=immediate --latitude=52.520007 --longitude=13.404954
Display 20 nearest stations for given position:
phenodata nearest-stations \ --source=dwd --dataset=immediate \ --latitude=52.520007 --longitude=13.404954 --limit=20
Observations
Observations of hazel and snowdrop, using filename-based filtering at data acquisition time:
phenodata observations --source=dwd --dataset=annual --partition=recent --filename=Hasel,Schneegloeckchen
Observations of hazel and snowdrop (dito), but for station ids 164 and 717 only:
phenodata observations \ --source=dwd --dataset=annual --partition=recent \ --filename=Hasel,Schneegloeckchen --station-id=164,717
All observations for station ids 164 and 717 in years 2016 and 2017:
phenodata observations \ --source=dwd --dataset=annual --partition=recent \ --station-id=164,717 --year=2016,2017
All observations for station ids 164 and 717 and species ids 113 and 127:
phenodata observations \ --source=dwd --dataset=annual --partition=recent \ --station-id=164,717 --species-id=113,127
All invalid observations:
phenodata list-quality-bytes --source=dwd phenodata observations --source=dwd --dataset=annual --partition=recent --quality-byte=5,6,7,8
Forecasting
Acquire data from observations in Berlin-Dahlem and München-Pasing and forecast to current year using grouping and by computing the “mean” value of the “Jultag” column:
phenodata forecast \ --source=dwd --dataset=annual --partition=recent \ --filename=Hasel,Schneegloeckchen,Apfel,Birne \ --station-id=12132,10961 --format=string
Humanized output examples
The option --humanize will improve textual output by resolving ID columns in the observation data to their appropriate text representions from metadata files.
Observations
Observations for species “hazel”, “snowdrop”, “apple” and “pear” at station “Berlin-Dahlem”, output texts in the German language if possible:
phenodata observations \ --source=dwd --dataset=annual --partition=recent \ --filename=Hasel,Schneegloeckchen,Apfel,Birne \ --station-id=12132 \ --humanize --language=german
Forecasting
Specific events
Forecast of “beginning of flowering” events at station “Berlin-Dahlem”. Use all species of the “primary group”: “hazel”, “snowdrop”, “goat willow”, “dandelion”, “cherry”, “apple”, “winter oilseed rape”, “black locust” and “common heather”. Sort by date, ascending.
- phenodata forecast
–source=dwd –dataset=annual –partition=recent –filename=Hasel,Schneegloeckchen,Sal-Weide,Loewenzahn,Suesskirsche,Apfel,Winterraps,Robinie,Winter-Linde,Heidekraut –station-id=12132 –phase-id=5 –humanize –sort=Datum
Event sequence for each species
Forecast of all events at station “Berlin-Dahlem”. Use all species of the “primary group” (dito). Sort by species and date, ascending.
- phenodata forecast
–source=dwd –dataset=annual –partition=recent –filename=Hasel,Schneegloeckchen,Sal-Weide,Loewenzahn,Suesskirsche,Apfel,Winterraps,Robinie,Winter-Linde,Heidekraut –station-id=12132 –humanize –lang=german –sort=Spezies,Datum
Humanized search examples
Todo
Display regular flowering events for hazel and snowdrop around Berlin and Brandenburg (Germany) in 2017:
phenodata calendar --source=dwd --dataset=immediate --partition=recent --regions=berlin,brandenburg --species=hazel,snowdrop --phases=flowering --years=2017 phenodata calendar --source=dwd --dataset=immediate --partition=historical --regions=berlin,brandenburg --species=hazel,snowdrop --phases=flowering --years=1958
Display forecast for “beginning of flowering” events for canola and sweet cherry around Thüringen and Bayern (Germany), deduced from annual/recent data:
phenodata calendar --source=dwd --dataset=annual --partition=recent --regions=thüringen,bayern --species=raps,süßkirsche --phases-bbch=60 --forecast
Project information
About
The “phenodata” program is released under the AGPL license. The code lives on GitHub and the Python package is published to PyPI. You might also want to have a look at the documentation.
The software has been tested on Python 2.7.
If you’d like to contribute you’re most welcome! Spend some time taking a look around, locate a bug, design issue or spelling mistake and then send us a pull request or create an issue.
Thanks in advance for your efforts, we really appreciate any help or feedback.
Code license
Licensed under the AGPL license. See LICENSE file for details.
Data license
The DWD has information about their re-use policy in German and English. Please refer to the respective Disclaimer (de, en) and Copyright (de, en) information.
Disclaimer
The project and its authors are not affiliated with DWD, USA-NPN or any other data provider in any way. It is a sole project from the community for making data more accessible in the spirit of open data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.