Skip to main content

A Python package for offline access to Vega datasets

Project description

vega_datasets

build status code style black

A Python package for offline access to vega datasets.

This package has several goals:

  • Provide straightforward access in Python to the datasets made available at vega-datasets.
  • return the results in the form of a Pandas dataframe.
  • wherever dataset size and/or license constraints make it possible, bundle the dataset with the package so that datasets can be loaded in the absence of a web connection.

Currently the package bundles a half-dozen datasets, and falls back to using HTTP requests for the others.

Installation

$ pip install vega_datasets

Usage

The main object in this library is data:

>>> from vega_datasets import data

It contains attributes that access all available datasets, locally if available. For example, here is the well-known iris dataset:

>>> df = data.iris()
>>> df.head()
   petalLength  petalWidth  sepalLength  sepalWidth species
0          1.4         0.2          5.1         3.5  setosa
1          1.4         0.2          4.9         3.0  setosa
2          1.3         0.2          4.7         3.2  setosa
3          1.5         0.2          4.6         3.1  setosa
4          1.4         0.2          5.0         3.6  setosa

If you're curious about the source data, you can access the URL for any of the available datasets:

>>> data.iris.url
'https://vega.github.io/vega-datasets/data/iris.json'

For datasets bundled with the package, you can also find their location on disk:

>>> data.iris.filepath
'/lib/python3.6/site-packages/vega_datasets/data/iris.json'

Available Datasets

To list all the available datsets, use list_datasets:

>>> data.list_datasets()
['7zip', 'airports', 'anscombe', 'barley', 'birdstrikes', 'budget', 'budgets', 'burtin', 'cars', 'climate', 'co2-concentration', 'countries', 'crimea', 'disasters', 'driving', 'earthquakes', 'ffox', 'flare', 'flare-dependencies', 'flights-10k', 'flights-200k', 'flights-20k', 'flights-2k', 'flights-3m', 'flights-5k', 'flights-airport', 'gapminder', 'gapminder-health-income', 'gimp', 'github', 'graticule', 'income', 'iris', 'jobs', 'londonBoroughs', 'londonCentroids', 'londonTubeLines', 'lookup_groups', 'lookup_people', 'miserables', 'monarchs', 'movies', 'normal-2d', 'obesity', 'points', 'population', 'population_engineers_hurricanes', 'seattle-temps', 'seattle-weather', 'sf-temps', 'sp500', 'stocks', 'udistrict', 'unemployment', 'unemployment-across-industries', 'us-10m', 'us-employment', 'us-state-capitals', 'weather', 'weball26', 'wheat', 'world-110m', 'zipcodes']

To list local datasets (i.e. those that are bundled with the package and can be used without a web connection), use the local_data object instead:

>>> from vega_datasets import local_data
>>> local_data.list_datasets()

['airports', 'anscombe', 'barley', 'burtin', 'cars', 'crimea', 'driving', 'iowa-electricity', 'iris', 'seattle-temps', 'seattle-weather', 'sf-temps', 'stocks', 'us-employment', "wheat"]

We plan to add more local datasets in the future, subject to size and licensing constraints. See the local datasets issue if you would like to help with this.

Dataset Information

If you want more information about any dataset, you can use the description property:

>>> data.iris.description
'This classic dataset contains lengths and widths of petals and sepals for 150 iris flowers, drawn from three species. It was introduced by R.A. Fisher in 1936 [1]_.'

This information is also part of the data.iris doc string. Descriptions are not yet included for all the datasets in the package; we hope to add more information on this in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vega_datasets-0.8.0.tar.gz (213.8 kB view details)

Uploaded Source

Built Distribution

vega_datasets-0.8.0-py2.py3-none-any.whl (210.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vega_datasets-0.8.0.tar.gz.

File metadata

  • Download URL: vega_datasets-0.8.0.tar.gz
  • Upload date:
  • Size: 213.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.2

File hashes

Hashes for vega_datasets-0.8.0.tar.gz
Algorithm Hash digest
SHA256 db8883dab72b6f414e1fafdbf1e8db7543bba6ed77912a4e0c197d74fcfa1c20
MD5 d1d9307a28924536c52f5b708acf8bcf
BLAKE2b-256 e64ec16d1ae352e78d044bcc9541f9b51ab0bc35c0feaeb6cfb1efe5e0d9f181

See more details on using hashes here.

Provenance

File details

Details for the file vega_datasets-0.8.0-py2.py3-none-any.whl.

File metadata

  • Download URL: vega_datasets-0.8.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 210.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.2

File hashes

Hashes for vega_datasets-0.8.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ffc7debd6dcdccd57fb44b5efb4764c8a443673c2a83b58e2a4ee9f7faa35b85
MD5 2ade0744bd88ba0ccfb310c7ba320ec9
BLAKE2b-256 5f254fec53fdf998e7187b9372ac9811a6fc69f71d2d3a55aa1d17ed9c126c7e

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page