Skip to main content

Fast iteration of AVRO files

Project description

# fastavro

The current Python `avro` package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of
them. In comparison the JAVA `avro` SDK does it in about 1.9sec.

`fastavro` is less feature complete than `avro`, however it's much faster. It
iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll
do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON
encoding/decoding).

If the optional C extension (generated by [Cython][cython]) is available, then
`fastavro` will be even faster. For the same 10K records it'll run in about
1.7sec.

[Cython]: http://cython.org/

# Usage

## Reading

```python

import fastavro as avro

with open('weather.avro', 'rb') as fo:
reader = avro.reader(fo)
schema = reader.schema

for record in reader:
process_record(record)

```

## Writing

```python
from fastavro import writer

schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}

records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

with open('weather.avro', 'wb') as out:
writer(out, schema, records)

```

You can also use the `fastavro` script from the command line to dump `avro`
files.

fastavro weather.avro

By default fastavro prints one JSON object per line, you can use the `--pretty`
flag to change this.

You can also dump the avro schema

fastavro --schema weather.avro


Here's the full command line help

usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]

iter over avro file, emit records as JSON

positional arguments:
file file(s) to parse

optional arguments:
-h, --help show this help message and exit
--schema dump schema instead of records
--codecs print supported codecs
--version show program's version number and exit
-p, --pretty pretty print json


# Limitations
* No reader schema

# Hacking
As recommended by Cython, the C files output is distributed. This has the
advantage that the end user does not need to have Cython installed. However it
means that every time you change `fastavro/pyfastavro.py` you need to run
`make`.

For `make` to succeed you need both python and Python 3 installed, Cython on both
of them. For `./test-install.sh` you'll need [virtualenv][venv].

[venv]: http://pypi.python.org/pypi/virtualenv

# Builds
We're currently using [travis.ci](http://travis-ci.org/#!/tebeka/fastavro)


# Changes
See the [ChangeLog]

[ChangeLog]: https://github.com/tebeka/fastavro/blob/master/ChangeLog

# Contact
[Project Home](https://github.com/tebeka/fastavro)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-0.8.3.tar.gz (245.1 kB view details)

Uploaded Source

Built Distributions

fastavro-0.8.3-py3.4-linux-x86_64.egg (598.3 kB view details)

Uploaded Source

fastavro-0.8.3-py2.7-linux-x86_64.egg (193.5 kB view details)

Uploaded Source

File details

Details for the file fastavro-0.8.3.tar.gz.

File metadata

  • Download URL: fastavro-0.8.3.tar.gz
  • Upload date:
  • Size: 245.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fastavro-0.8.3.tar.gz
Algorithm Hash digest
SHA256 81b3c8ae368d1002f4ad5f725c42a6be4c9f76edc7cbeb57c92ad49f9ee1a926
MD5 646b94e5d9ac92f0faf4f1f75e288062
BLAKE2b-256 ae3d0ec3a4dda509c2f5fcbd437bdfe8805b2eb490cdad53e61059fc562618bd

See more details on using hashes here.

File details

Details for the file fastavro-0.8.3-py3.4-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.3-py3.4-linux-x86_64.egg
Algorithm Hash digest
SHA256 d3c5ac9c40d9b7541990fd28e4c3076efab16fe49caccb85c3b4fcb4b965d6d4
MD5 98f48414c8527d53b237c2a07c7b5176
BLAKE2b-256 d36d155f4c652bc49e80c75fab30a7d76f3da15f6421e2ddab676ab29721122b

See more details on using hashes here.

File details

Details for the file fastavro-0.8.3-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for fastavro-0.8.3-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 98fc9cb07d893d29cf9d2d3ff2116beaccccd174ac36504b79b7b6c4e24d2691
MD5 6ef54ff6bf30b3a6ee248068b5087356
BLAKE2b-256 3fd5f1759f2bb79008ed056c6bee68f0c034c1a539ff01ba83da314fb6739d5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page