Skip to main content

Consistent interface for stream reading and writing tabular data (csv/xls/json/etc)

Project description

Travis
Coveralls
PyPi
SemVer
Gitter

Consistent interface for stream reading and writing tabular data (csv/xls/json/etc).

Release v0.10 contains changes in exceptions module introduced in NOT backward-compatibility manner.

Features

  • supports various formats: csv/tsv/xls/xlsx/json/ndjson/ods/native/etc

  • reads data from variables, filesystem or Internet

  • streams data instead of using a lot of memory

  • processes data via simple user processors

  • saves data using the same interface

Getting Started

Installation

To get started:

$ pip install tabulator

Example

Open tabular stream from csv source:

from tabulator import Stream

with Stream('path.csv', headers=1) as stream:
    print(stream.headers) # will print headers from 1 row
    for row in stream:
        print(row)  # will print row values list

Stream

Stream takes the source argument:

<scheme>://path/to/file.<format>

and uses corresponding Loader and Parser to open and start to iterate over the tabular stream. Also user can pass scheme and format explicitly as constructor arguments. User can force Tabulator to use encoding of choice to open the table passing encoding argument.

In this example we use context manager to call stream.open() on enter and stream.close() when we exit:

  • stream can be iterated like file-like object returning row by row

  • stream can be used for manual iterating with iter(keyed/extended) function

  • stream can be read into memory using read(keyed/extended) function with row count limit

  • headers can be accessed via headers property

  • rows sample can be accessed via sample property

  • stream pointer can be set to start via reset method

  • stream could be saved to filesystem using save method

Below the more expanded example is presented:

from tabulator import Stream

def skip_even_rows(extended_rows):
    for number, headers, row in extended_rows:
        if number % 2:
            yield (number, headers, row)

stream = Stream('http://example.com/source.xls',
    headers=1, encoding='utf-8', sample_size=1000,
    post_parse=[skip_even_rows], sheet=1)
stream.open()
print(stream.sample)  # will print sample
print(stream.headers)  # will print headers list
print(stream.read(limit=10))  # will print 10 rows
stream.reset()
for keyed_row in stream.iter(keyed=True):
    print keyed_row  # will print row dict
for extended_row in stream.iter(extended=True):
    print extended_row  # will print (number, headers, row)
stream.reset()
stream.save('target.csv')
stream.close()

For the full list of options see - https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/stream.py#L17

CLI

It’s a provisional API excluded from SemVer. If you use it as a part of other program please pin concrete goodtables version to your requirements file.

The library ships with a simple CLI to read tabular data:

$ tabulator
Usage: cli.py [OPTIONS] SOURCE

Options:
  --headers INTEGER
  --scheme TEXT
  --format TEXT
  --encoding TEXT
  --limit INTEGER
  --help             Show this message and exit.

Shell usage example:

$ tabulator data/table.csv
id, name
1, english
2, 中国人

API Reference

Snapshot

Stream(source,
       headers=None,
       scheme=None,
       format=None,
       encoding=None,
       sample_size=None,
       post_parse=None,
       **options)
    closed/open/close/reset
    headers -> list
    sample -> rows
    iter(keyed/extended=False) -> (generator) (keyed/extended)row[]
    read(keyed/extended=False, limit=None) -> (keyed/extended)row[]
    save(target, format=None, encoding=None, **options)
exceptions
~cli

Detailed

Contributing

Please read the contribution guideline:

How to Contribute

Thanks!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabulator-0.11.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

tabulator-0.11.1-py2.py3-none-any.whl (33.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file tabulator-0.11.1.tar.gz.

File metadata

  • Download URL: tabulator-0.11.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tabulator-0.11.1.tar.gz
Algorithm Hash digest
SHA256 5aa5015eb4203f6285f8d602170ddf9bee36a2320afc8cf2f9fb29fe7c9859bb
MD5 cf77daa0849e3e5b5d8be9e83a1d63f2
BLAKE2b-256 44e02ff3e361959f0cf0cc36936e02dc321d6e3f99486cc28d8bf323b3d820e2

See more details on using hashes here.

Provenance

File details

Details for the file tabulator-0.11.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for tabulator-0.11.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fbdb002e8d1ba51566d932dc1b9100d47468f9dd5b2563b2fc8bee5854af4fe2
MD5 b17e2069ea1b0a944039a61cdd387a30
BLAKE2b-256 9c68bf9125566bc7e01a66b4dda5e1ff8d530496e393a945be6adc1fa3ba6990

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page