Skip to main content

Fast iteration of AVRO files

Project description

fastavro
========

The current Python `avro` package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of
them. In comparison the JAVA `avro` SDK does it in about 1.9sec.

`fastavro` is less feature complete than `avro`, however it's much faster. It
iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll
do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON
encoding/decoding).

If the optional C extension (generated by [Cython][cython]) is available, then
`fastavro` will be even faster. For the same 10K records it'll run in about
1.7sec.

`fastavro` supports the following Python versions:

* Python 2.7
* Python 3.4
* Python 3.5b2
* pypy
* pypy3

[Cython]: http://cython.org/

Usage
=====

Reading
-------


```python

import fastavro as avro

with open('weather.avro', 'rb') as fo:
reader = avro.reader(fo)
schema = reader.schema

for record in reader:
process_record(record)

```

Writing
-------

```python
from fastavro import writer

schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}

records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
writer(out, schema, records)

```

You can also use the `fastavro` script from the command line to dump `avro`
files.

fastavro weather.avro

By default fastavro prints one JSON object per line, you can use the `--pretty`
flag to change this.

You can also dump the avro schema

fastavro --schema weather.avro


Here's the full command line help

usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]

iter over avro file, emit records as JSON

positional arguments:
file file(s) to parse

optional arguments:
-h, --help show this help message and exit
--schema dump schema instead of records
--codecs print supported codecs
--version show program's version number and exit
-p, --pretty pretty print json


Limitations
===========

* No reader schema

Hacking
=======

As recommended by Cython, the C files output is distributed. This has the
advantage that the end user does not need to have Cython installed. However it
means that every time you change `fastavro/pyfastavro.py` you need to run
`make`.

For `make` to succeed you need both python and Python 3 installed, Cython on both
of them. For `./test-install.sh` you'll need [virtualenv][venv].

[venv]: http://pypi.python.org/pypi/virtualenv

Builds
======

We're currently using [travis.ci](http://travis-ci.org/#!/tebeka/fastavro)

[![Build Status](https://travis-ci.org/tebeka/fastavro.svg?branch=master)](https://travis-ci.org/tebeka/fastavro)


Changes
=======

See the [ChangeLog]

[ChangeLog]: https://github.com/tebeka/fastavro/blob/master/ChangeLog

Contact
=======

[Project Home](https://github.com/tebeka/fastavro)

Project details


Release history Release notifications | RSS feed

This version

0.8.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-0.8.5.tar.gz (253.6 kB view details)

Uploaded Source

Built Distributions

fastavro-0.8.5-py3.4-linux-x86_64.egg (632.0 kB view details)

Uploaded Source

fastavro-0.8.5-py2.7-linux-x86_64.egg (203.2 kB view details)

Uploaded Source

File details

Details for the file fastavro-0.8.5.tar.gz.

File metadata

  • Download URL: fastavro-0.8.5.tar.gz
  • Upload date:
  • Size: 253.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastavro-0.8.5.tar.gz
Algorithm Hash digest
SHA256 6a5ed4f089632045f44171a56a209b2ee5c0ec3541117c1213943b77fd06458c
MD5 5c51b9c92423c1579b63adf61478427b
BLAKE2b-256 9086c9595943771c4fc20ee646182b435147697390b32fce2a9a59690f6deb50

See more details on using hashes here.

File details

Details for the file fastavro-0.8.5-py3.4-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.5-py3.4-linux-x86_64.egg
Algorithm Hash digest
SHA256 002a1e1d30d4be90c5221aaaf69f85df1e4d8aa45881da24550dcde9fab68f61
MD5 8791f83cf168fc0851f6c1115a769986
BLAKE2b-256 40f226f9f7c142020a611482a9b966f8bb5c7c4a9a111b6ba0abec6e8fea1f5f

See more details on using hashes here.

File details

Details for the file fastavro-0.8.5-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.5-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 7306460307a9b91f348ef1e62447dcdd2c3cb68d0b44c63df4fa704e75f39f82
MD5 5884054e8c6a26e60b9023005886b40c
BLAKE2b-256 98e98063b765b11f9b16f2099ccfe6d8e5b5a0536b2c5c2fd1cdd7b12a04efad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page