Skip to main content

Fast iteration of AVRO files

Project description

fastavro
========

The current Python `avro` package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of
them. In comparison the JAVA `avro` SDK does it in about 1.9sec.

`fastavro` is less feature complete than `avro`, however it's much faster. It
iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll
do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON
encoding/decoding).

If the optional C extension (generated by [Cython][cython]) is available, then
`fastavro` will be even faster. For the same 10K records it'll run in about
1.7sec.

`fastavro` supports the following Python versions:

* Python 2.7
* Python 3.4
* Python 3.5b2
* pypy
* pypy3

[Cython]: http://cython.org/

Usage
=====

Reading
-------


```python

import fastavro as avro

with open('weather.avro', 'rb') as fo:
reader = avro.reader(fo)
schema = reader.schema

for record in reader:
process_record(record)

```

Writing
-------

```python
from fastavro import writer

schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}

records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
writer(out, schema, records)

```

You can also use the `fastavro` script from the command line to dump `avro`
files.

fastavro weather.avro

By default fastavro prints one JSON object per line, you can use the `--pretty`
flag to change this.

You can also dump the avro schema

fastavro --schema weather.avro


Here's the full command line help

usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]

iter over avro file, emit records as JSON

positional arguments:
file file(s) to parse

optional arguments:
-h, --help show this help message and exit
--schema dump schema instead of records
--codecs print supported codecs
--version show program's version number and exit
-p, --pretty pretty print json


Limitations
===========

* No reader schema

Hacking
=======

As recommended by Cython, the C files output is distributed. This has the
advantage that the end user does not need to have Cython installed. However it
means that every time you change `fastavro/pyfastavro.py` you need to run
`make`.

For `make` to succeed you need both python and Python 3 installed, Cython on both
of them. For `./test-install.sh` you'll need [virtualenv][venv].

[venv]: http://pypi.python.org/pypi/virtualenv

Builds
======

We're currently using [travis.ci](http://travis-ci.org/#!/tebeka/fastavro)

[![Build Status](https://travis-ci.org/tebeka/fastavro.svg?branch=master)](https://travis-ci.org/tebeka/fastavro)


Changes
=======

See the [ChangeLog]

[ChangeLog]: https://github.com/tebeka/fastavro/blob/master/ChangeLog

Contact
=======

[Project Home](https://github.com/tebeka/fastavro)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-0.8.6.tar.gz (257.1 kB view details)

Uploaded Source

Built Distributions

fastavro-0.8.6-py3.4-linux-x86_64.egg (639.6 kB view details)

Uploaded Source

fastavro-0.8.6-py2.7-linux-x86_64.egg (205.2 kB view details)

Uploaded Source

File details

Details for the file fastavro-0.8.6.tar.gz.

File metadata

  • Download URL: fastavro-0.8.6.tar.gz
  • Upload date:
  • Size: 257.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastavro-0.8.6.tar.gz
Algorithm Hash digest
SHA256 c961aaec81d590ddb6d54beb4b4fd8cebde88445b913a8edfccdbed4882e3cdb
MD5 478cbde229b34c7f1c4641b1fc2aeb9b
BLAKE2b-256 a692d2ad64cd2b3958e3aa333db2fff2aba940de0626a4d439aca8ec313775fc

See more details on using hashes here.

File details

Details for the file fastavro-0.8.6-py3.4-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.6-py3.4-linux-x86_64.egg
Algorithm Hash digest
SHA256 ad303ac14563fbd33d90ad40ddeac4f663201751e3b10854b8b29f83111f9467
MD5 6c5b19b254dab44bd3c0c35078b377a7
BLAKE2b-256 33ebbecb4b08b8bf33dc0b4514657117f0f18eeeb7b95093af3135e8be142b2a

See more details on using hashes here.

File details

Details for the file fastavro-0.8.6-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.6-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 f6f0e6cd2f366f1cdea33d7a6604fb685ed103e82adb0bb737589b1dfc25dfa2
MD5 1426e266082fe394728a319fe61356c0
BLAKE2b-256 231779b3ebd42629c3e00f52f578f1b23d8209d3d093d16d3b6d7527153d5c80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page