Skip to main content

Fast read/write of AVRO files

Project description

fastavro
========

The current Python `avro` package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of
them. In comparison the JAVA `avro` SDK does it in about 1.9sec.

`fastavro` is less feature complete than `avro`, however it's much faster. It
iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll
do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON
encoding/decoding).

If the optional C extension (generated by [Cython][cython]) is available, then
`fastavro` will be even faster. For the same 10K records it'll run in about
1.7sec.

`fastavro` supports the following Python versions:

* Python 2.7
* Python 3.4
* Python 3.5b2
* pypy
* pypy3

[Cython]: http://cython.org/

Usage
=====

Reading
-------


```python

import fastavro as avro

with open('weather.avro', 'rb') as fo:
reader = avro.reader(fo)
schema = reader.schema

for record in reader:
process_record(record)

```

Writing
-------

```python
from fastavro import writer

schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}

records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
writer(out, schema, records)

```

You can also use the `fastavro` script from the command line to dump `avro`
files.

fastavro weather.avro

By default fastavro prints one JSON object per line, you can use the `--pretty`
flag to change this.

You can also dump the avro schema

fastavro --schema weather.avro


Here's the full command line help

usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]

iter over avro file, emit records as JSON

positional arguments:
file file(s) to parse

optional arguments:
-h, --help show this help message and exit
--schema dump schema instead of records
--codecs print supported codecs
--version show program's version number and exit
-p, --pretty pretty print json


Limitations
===========

* No reader schema

Hacking
=======

As recommended by Cython, the C files output is distributed. This has the
advantage that the end user does not need to have Cython installed. However it
means that every time you change `fastavro/pyfastavro.py` you need to run
`make`.

For `make` to succeed you need both python and Python 3 installed, Cython on both
of them. For `./test-install.sh` you'll need [virtualenv][venv].

[venv]: http://pypi.python.org/pypi/virtualenv

Builds
======

We're currently using [travis.ci](http://travis-ci.org/#!/tebeka/fastavro)

[![Build Status](https://travis-ci.org/tebeka/fastavro.svg?branch=master)](https://travis-ci.org/tebeka/fastavro)


Changes
=======

See the [ChangeLog]

[ChangeLog]: https://github.com/tebeka/fastavro/blob/master/ChangeLog

Contact
=======

[Project Home](https://github.com/tebeka/fastavro)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-0.9.2.tar.gz (262.0 kB view details)

Uploaded Source

Built Distributions

fastavro-0.9.2-py3.4-linux-x86_64.egg (656.4 kB view details)

Uploaded Source

fastavro-0.9.2-py2.7-linux-x86_64.egg (209.9 kB view details)

Uploaded Source

File details

Details for the file fastavro-0.9.2.tar.gz.

File metadata

  • Download URL: fastavro-0.9.2.tar.gz
  • Upload date:
  • Size: 262.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastavro-0.9.2.tar.gz
Algorithm Hash digest
SHA256 9d8ee57c2d62c12b2d2620bc6d7caed10fcae82c206d1bfa3b66f74aba739bbd
MD5 77ab90035dc8267ab4f62129739d1719
BLAKE2b-256 a05a13dae61c9a70b9bc806fadbbfbb19168f690414174fe47d5179f215531c4

See more details on using hashes here.

File details

Details for the file fastavro-0.9.2-py3.4-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.9.2-py3.4-linux-x86_64.egg
Algorithm Hash digest
SHA256 d5e951ec3b2771187c06fab9b5c66c6a14078c2306e92c99e4372ca0d0b6d1c6
MD5 d00eb8a3f712b2133c94c5b2c6b63287
BLAKE2b-256 e0c57e1223ef4e62e0e03357b53f43e1f7d9f6a8b4dd7c94a4452a1395c1d593

See more details on using hashes here.

File details

Details for the file fastavro-0.9.2-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.9.2-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 727b045d941c822eac5fbf28638c6d0f8cd1dc189b9c8da6beb9042385dee74b
MD5 0b2180aec0dc6b1ab44e23d09778cea5
BLAKE2b-256 aa7e8d733323289d31dd5665d4482de4b8aec283d33695d493b2121f63e61771

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page