Skip to main content

Fast read/write of AVRO files

Project description

fastavro
========

The current Python `avro` package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of
them. In comparison the JAVA `avro` SDK does it in about 1.9sec.

`fastavro` is less feature complete than `avro`, however it's much faster. It
iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll
do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON
encoding/decoding).

If the optional C extension (generated by [Cython][cython]) is available, then
`fastavro` will be even faster. For the same 10K records it'll run in about
1.7sec.

`fastavro` supports the following Python versions:

* Python 2.6
* Python 2.7
* Python 3.4
* Python 3.5b2
* pypy
* pypy3

[Cython]: http://cython.org/

Usage
=====

Reading
-------


```python

import fastavro as avro

with open('weather.avro', 'rb') as fo:
reader = avro.reader(fo)
schema = reader.schema

for record in reader:
process_record(record)

```

Writing
-------

```python
from fastavro import writer

schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}

# 'records' can be any iterable (including a generator)
records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
writer(out, schema, records)

```

You can also use the `fastavro` script from the command line to dump `avro`
files.

fastavro weather.avro

By default fastavro prints one JSON object per line, you can use the `--pretty`
flag to change this.

You can also dump the avro schema

fastavro --schema weather.avro


Here's the full command line help

usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]

iter over avro file, emit records as JSON

positional arguments:
file file(s) to parse

optional arguments:
-h, --help show this help message and exit
--schema dump schema instead of records
--codecs print supported codecs
--version show program's version number and exit
-p, --pretty pretty print json


Limitations
===========

* No reader schema

Hacking
=======

As recommended by Cython, the C files output is distributed. This has the
advantage that the end user does not need to have Cython installed. However it
means that every time you change `fastavro/pyfastavro.py` you need to run
`make`.

For `make` to succeed you need both python and Python 3 installed, Cython on both
of them. For `./test-install.sh` you'll need [virtualenv][venv].

[venv]: http://pypi.python.org/pypi/virtualenv

Builds
======

We're currently using [travis.ci](http://travis-ci.org/#!/tebeka/fastavro)

[![Build Status](https://travis-ci.org/tebeka/fastavro.svg?branch=master)](https://travis-ci.org/tebeka/fastavro)


Changes
=======

See the [ChangeLog]

[ChangeLog]: https://github.com/tebeka/fastavro/blob/master/ChangeLog

Contact
=======

[Project Home](https://github.com/tebeka/fastavro)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-0.11.1.tar.gz (331.6 kB view details)

Uploaded Source

Built Distributions

fastavro-0.11.1-py3.5-linux-x86_64.egg (869.9 kB view details)

Uploaded Source

fastavro-0.11.1-py2.7-linux-x86_64.egg (304.3 kB view details)

Uploaded Source

File details

Details for the file fastavro-0.11.1.tar.gz.

File metadata

  • Download URL: fastavro-0.11.1.tar.gz
  • Upload date:
  • Size: 331.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastavro-0.11.1.tar.gz
Algorithm Hash digest
SHA256 948c5a56e8f5a8f8f99e00a9e06f6811974e104c9be42fb5b9ebd75b163fcad8
MD5 4291860d15a784e8e830ab67ebe58112
BLAKE2b-256 199d1b6093a086aa593a17144642bf5588b2fcc1bb4d81678fa7e8bca6c4d88c

See more details on using hashes here.

File details

Details for the file fastavro-0.11.1-py3.5-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.11.1-py3.5-linux-x86_64.egg
Algorithm Hash digest
SHA256 c2794e283479f4f103e062334dcb5bf9406391cfb907c53be07e56dca01a085e
MD5 5d32138b05034df9b16372a316528cf5
BLAKE2b-256 af091355643b11344293b55f2b5075b3074cfdd8a2e7e6e14a90ba042a051805

See more details on using hashes here.

File details

Details for the file fastavro-0.11.1-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.11.1-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 972517e23f7b24bf6d446105944a4de9ef5bd409987c0930e9b24b3ebf978f12
MD5 72a4eab4c0d9c2de26f289911c18a7c4
BLAKE2b-256 d8fcb0d5bc65b6adf210b5a1b9d5b9b60114f06295b32eb14f43af7e097b0c71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page