Skip to main content

Generate Pandas data frames, load and extract data, based on JSON Table Schema descriptors.

Project description

Travis
Coveralls
PyPi
SemVer
Gitter

Generate and load Pandas data frames based on JSON Table Schema descriptors.

Version v0.2 contains breaking changes:

  • removed Storage(prefix=) argument (was a stub)

  • renamed Storage(tables=) to Storage(dataframes=)

  • renamed Storage.tables to Storage.buckets

  • changed Storage.read to read into memory

  • added Storage.iter to yield row by row

Getting Started

Installation

$ pip install datapackage
$ pip install jsontableschema-pandas

Example

You can easily load resources from a data package as Pandas data frames by simply using datapackage.push_datapackage function:

>>> import datapackage

>>> data_url = 'http://data.okfn.org/data/core/country-list/datapackage.json'
>>> storage = datapackage.push_datapackage(data_url, 'pandas')

>>> storage.buckets
['data___data']

>>> type(storage['data___data'])
<class 'pandas.core.frame.DataFrame'>

>>> storage['data___data'].head()
             Name Code
0     Afghanistan   AF
1   Åland Islands   AX
2         Albania   AL
3         Algeria   DZ
4  American Samoa   AS

Also it is possible to pull your existing data frame into a data package:

>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
...     'data': storage['data___data'],
... })
Storage

Storage

Package implements Tabular Storage interface.

We can get storage this way:

>>> from jsontableschema_pandas import Storage

>>> storage = Storage()

Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create method:

>>> storage.create('data', {
...     'primaryKey': 'id',
...     'fields': [
...         {'name': 'id', 'type': 'integer'},
...         {'name': 'comment', 'type': 'string'},
...     ]
... })

>>> storage.buckets
['data']

>>> storage['data'].shape
(0, 0)

Use storage.write to populate data frame with data:

>>> storage.write('data', [(1, 'a'), (2, 'b')])

>>> storage['data']
id comment
1        a
2        b

Also you can use tabulator to populate data frame from external data file:

>>> import tabulator

>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
...     storage.write('data', stream)

>>> storage['data']
id comment
1        a
2        b
1     good

As you see, subsequent writes simply appends new data on top of existing ones.

API Reference

Snapshot

https://github.com/frictionlessdata/jsontableschema-py#snapshot

Detailed

Contributing

Please read the contribution guideline:

How to Contribute

Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsontableschema-pandas-0.4.1.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

jsontableschema_pandas-0.4.1-py2.py3-none-any.whl (9.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jsontableschema-pandas-0.4.1.tar.gz.

File metadata

File hashes

Hashes for jsontableschema-pandas-0.4.1.tar.gz
Algorithm Hash digest
SHA256 1b55f69c4b29bc8d2353ae4fa65b1148b366eb93614018ed9f53f8ca3dfc4836
MD5 cc489c1d8c5409f8239acf5ee0128033
BLAKE2b-256 144c24e18feaf158291f56e331b12a478e91745271bb02fafe010c1255c4482e

See more details on using hashes here.

Provenance

File details

Details for the file jsontableschema_pandas-0.4.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for jsontableschema_pandas-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6248f9c4aec173d1f4b4a1e7e8f17454fada148132a7720e73b69b2c58498a33
MD5 021c8e9511f9bc9d2c584f6fe18bfc04
BLAKE2b-256 a046fb5d44e6b3bf99ab26d282803d094e2c8310ec025ee6fa2f4f82866584b4

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page