Generate Pandas data frames, load and extract data, based on JSON Table Schema descriptors.
Project description
Generate and load Pandas data frames based on JSON Table Schema descriptors.
Version v0.2 contains breaking changes:
removed Storage(prefix=) argument (was a stub)
renamed Storage(tables=) to Storage(dataframes=)
renamed Storage.tables to Storage.buckets
changed Storage.read to read into memory
added Storage.iter to yield row by row
Getting Started
Installation
$ pip install datapackage $ pip install jsontableschema-pandas
Example
You can easily load resources from a data package as Pandas data frames by simply using datapackage.push_datapackage function:
>>> import datapackage
>>> data_url = 'http://data.okfn.org/data/core/country-list/datapackage.json'
>>> storage = datapackage.push_datapackage(data_url, 'pandas')
>>> storage.buckets
['data___data']
>>> type(storage['data___data'])
<class 'pandas.core.frame.DataFrame'>
>>> storage['data___data'].head()
Name Code
0 Afghanistan AF
1 Åland Islands AX
2 Albania AL
3 Algeria DZ
4 American Samoa AS
Also it is possible to pull your existing data frame into a data package:
>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
... 'data': storage['data___data'],
... })
Storage
Storage
Package implements Tabular Storage interface.
We can get storage this way:
>>> from jsontableschema_pandas import Storage
>>> storage = Storage()
Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create method:
>>> storage.create('data', {
... 'primaryKey': 'id',
... 'fields': [
... {'name': 'id', 'type': 'integer'},
... {'name': 'comment', 'type': 'string'},
... ]
... })
>>> storage.buckets
['data']
>>> storage['data'].shape
(0, 0)
Use storage.write to populate data frame with data:
>>> storage.write('data', [(1, 'a'), (2, 'b')])
>>> storage['data']
id comment
1 a
2 b
Also you can use tabulator to populate data frame from external data file:
>>> import tabulator
>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
... storage.write('data', stream)
>>> storage['data']
id comment
1 a
2 b
1 good
As you see, subsequent writes simply appends new data on top of existing ones.
API Reference
Snapshot
https://github.com/frictionlessdata/jsontableschema-py#snapshot
Detailed
Contributing
Please read the contribution guideline:
Thanks!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jsontableschema-pandas-0.5.0.tar.gz
.
File metadata
- Download URL: jsontableschema-pandas-0.5.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf5833ebe4ddcab29f3c652304aff6a4316fa9b92d52d9ee047b9cffbb18cebf |
|
MD5 | 0f820382e791c942ace2866298d8e41f |
|
BLAKE2b-256 | fae7f73e2c77418819cb704e0d5ec08d7507df37a29eb5e06f5b403a39d669c5 |
File details
Details for the file jsontableschema_pandas-0.5.0-py2.py3-none-any.whl
.
File metadata
- Download URL: jsontableschema_pandas-0.5.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32895c32d83d644ca017b5d376911a9ff018b00f550fdd70d967e34da4c29a25 |
|
MD5 | 66a29c27f4a7134b11c63b03a4925fca |
|
BLAKE2b-256 | 9204808b627c8d314bee0e45a296e8810328d5f7bc1449a77c63f6e79812ed59 |