Skip to main content

A utility library for working with JSON Table Schema in Python

Project description

Travis
Coveralls
PyPi
SemVer
Gitter

A utility library for working with JSON Table Schema in Python.

With v0.7 renewed API has been introduced in backward-compatibility manner. Documentation for deprecated API could be found here. Deprecated API will be removed with v1 release.

Features

  • Table to work with data tables described by JSON Table Schema

  • Schema representing JSON Table Schema

  • Field representing JSON Table Schema field

  • Storage to connect your tables to different storage backends like SQL Database

  • validate to validate JSON Table Schema (also in CLI)

  • infer to infer JSON Table Schema from data (also in CLI)

Gettings Started

Installation

pip install jsontableschema

Example

from jsontableschema import Table

# Create table
table = Table('path.csv', schema='schema.json')

# Print schema descriptor
print(table.schema.descriptor)

# Print cast rows in a dict form
for keyed_row in table.iter(keyed=True):
    print(keyed_row)

Documentation

Let’s look at each of the components in more detail.

Table

Table represents data described by JSON Table Schema:

# pip install sqlalchemy jsontableschema-sql
import sqlalchemy as sa
from pprint import pprint
from jsontableschema import Table

# Data source
SOURCE = 'https://raw.githubusercontent.com/okfn/jsontableschema-py/master/data/data_infer.csv'

# Create SQL database
db = sa.create_engine('sqlite://')

# Data processor
def skip_under_30(erows):
    for number, headers, row in erows:
        krow = dict(zip(headers, row))
        if krow['age'] >= 30:
            yield (number, headers, row)

# Work with table
table = Table(SOURCE, post_cast=[skip_under_30])
table.schema.save('tmp/persons.json') # Save INFERRED schema
table.save('persons', backend='sql', engine=db) # Save data to SQL
table.save('tmp/persons.csv')  # Save data to DRIVE

# Check the result
pprint(Table('persons', backend='sql', engine=db).read(keyed=True))
pprint(Table('tmp/persons.csv').read(keyed=True))
# Will print (twice)
# [{'age': 39, 'id': 1, 'name': 'Paul'},
#  {'age': 36, 'id': 3, 'name': 'Jane'}]

Schema

A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a filepath or url to a JSON file, or a Python dict. The schema is initially validated (see validate below), and will raise an exception if not a valid JSON Table Schema.

from jsontableschema import Schema

# Init schema
schema = Schema('path.json')

# Cast a row
schema.cast_row(['12345', 'a string', 'another field'])

Methods available to Schema instances:

  • descriptor - return schema descriptor

  • fields - an array of the schema’s Field instances

  • headers - an array of the schema headers

  • primary_key - the primary key field for the schema as an array

  • foreignKey - the foreign key property for the schema as an array

  • get_field(name) - return the field object for given name

  • has_field(name) - return a bool if the field exists in the schema

  • cast_row(row, no_fail_fast=False) - return row cast against schema

  • save(target) - save schema to filesystem

Where the option no_fail_fast is given, it will collect all errors it encouters and an exceptions.MultipleInvalid will be raised (if there are errors).

Field

from jsontableschemal import Field

# Init field
field = Field({'type': 'number'})

# Cast a value
field.cast_value('12345') # -> 12345

Data values can be cast to native Python objects with a Field instance. Type instances can be initialized with field descriptors. This allows formats and constraints to be defined.

Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema. E.g. a date value (in ISO 8601 format) can be cast with a DateType instance. Values that can’t be cast will raise an InvalidCastError exception.

Casting a value that doesn’t meet the constraints will raise a ConstraintError exception.

Storage

On level between the high-level interface and low-level driver package uses Tabular Storage concept:

Tabular Storage

To write you own storage driver implement jsontableschema.Storage interface.

validate

Given a schema as JSON file, url to JSON file, or a Python dict, validate returns True for a valid JSON Table Schema, or raises an exception, SchemaValidationError. It validates only schema, not data against schema!

import io
import json

from jsontableschema import validate

with io.open('schema_to_validate.json') as stream:
    descriptor = json.load(stream)

try:
    jsontableschema.validate(descriptor)
except jsontableschema.exceptions.SchemaValidationError as exception:
   # handle error

It may be useful to report multiple errors when validating a schema. This can be done with no_fail_fast flag set to True.

try:
    jsontableschema.validate(descriptor, no_fail_fast=True)
except jsontableschema.exceptions.MultipleInvalid as exception:
    for error in exception.errors:
        # handle error

infer

Given headers and data, infer will return a JSON Table Schema as a Python dict based on the data values. Given the data file, data_to_infer.csv:

id,age,name
1,39,Paul
2,23,Jimmy
3,36,Jane
4,28,Judy

Call infer with headers and values from the datafile:

import io
import csv

from jsontableschema import infer

filepath = 'data_to_infer.csv'
with io.open(filepath) as stream:
    headers = stream.readline().rstrip('\n').split(',')
    values = csv.reader(stream)

schema = infer(headers, values)

schema is now a schema dict:

{u'fields': [
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'id',
        u'title': u'',
        u'type': u'integer'
    },
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'age',
        u'title': u'',
        u'type': u'integer'
    },
    {
        u'description': u'',
        u'format': u'default',
        u'name': u'name',
        u'title': u'',
        u'type': u'string'
    }]
}

The number of rows used by infer can be limited with the row_limit argument.

exceptions

The library provides various of exceptions. Please consult with docstrings.

plugins

JSON Table Schema has a plugin system. Any package with the name like jsontableschema_<name> could be imported as:

from jsontableschema.plugins import <name>

If a plugin is not installed ImportError will be raised with a message describing how to install the plugin.

A list of officially supported plugins:

CLI

CLI is not a part of SemVer versionning. If you use it programatically please pin concrete goodtables version to your requirements file.

JSON Table Schema features a CLI called jsontableschema. This CLI exposes the infer and validate functions for command line use.

Example of validate usage:

$ jsontableschema validate path/to-schema.json

Example of infer usage:

$ jsontableschema infer path/to/data.csv

The response is a schema as JSON. The optional argument --encoding allows a character encoding to be specified for the data file. The default is utf-8.

Read more

Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsontableschema-0.7.1.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

jsontableschema-0.7.1-py2.py3-none-any.whl (44.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jsontableschema-0.7.1.tar.gz.

File metadata

File hashes

Hashes for jsontableschema-0.7.1.tar.gz
Algorithm Hash digest
SHA256 baec3a0b5db2e4bb7ac76f0d687badad6cfd8e389cf6cbabb73c062ea8e20476
MD5 0a9bdc342c42abdf7598a69c39978557
BLAKE2b-256 dafe25ab77e79f3234b4bcf321e3213811516c83402c163937c8df02bc6df46f

See more details on using hashes here.

Provenance

File details

Details for the file jsontableschema-0.7.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for jsontableschema-0.7.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 af5385f6970419c791c75ad8db2513b4d89054b1006787f7e798a36293b89740
MD5 a5f8fcbf12c1b3ec8a76cc73d509a14b
BLAKE2b-256 d8260707699496daf0891f84a8b732a79d3456930789986d49f829ed31e49d03

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page