Skip to main content

Flat-file datastore for timeseries data

Project description

PyStore - Fast data store for Pandas timeseries data

Python version Travis-CI build status PyPi version PyPi status Star this repo Follow me on twitter

PyStore is a simple (yet powerful) datastore for Pandas dataframes, and while it can store any Pandas object, it was designed with storing timeseries data in mind.

It’s built on top of Pandas, Numpy, Dask, and Parquet (via Fastparquet), to provide an easy to use datastore for Python developers that can easily query millions of rows per second per client.

PyStore is hugely inspired by Man AHL’s Arctic, and I highly reommend you check it out.

Quickstart

Install PyStore

Install using pip:

$ pip install PyStore

Or upgrade using:

$ pip install PyStore --upgrade --no-cache-dir

Using PyStore

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pystore
import quandl

# Set storage path (optional, default is `~/.pystore`)
pystore.set_path('/usr/share/pystore')

# List stores
pystore.list_stores()

# Connect to datastore (create it if not exist)
store = pystore.store('mydatastore')

# List existing collections
store.list_collections()

# Access a collection (create it if not exist)
collection = store.collection('NASDAQ')

# List items in collection
collection.list_items()

# Load some data from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

# Reading the item's data
item = collection.item('AAPL')
data = item.data  # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()

# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])

# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()

Concepts

PyStore provides namespaced collections of data. These collections allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

  • collection.EOD

  • collection.ONEMINUTE

Requirements

  • Python >= 3.5

  • Pandas

  • Numpy

  • Dask

  • Fastparquet

  • Snappy (Google’s compression/decompression library)

PyStore was tested to work on *NIX-like systems, including macOS.

Dependencies:

PyStore uses Snappy, a fast and efficient compression/decompression library from Google. You can install Snappy on *nix-like systems using your system’s package manager.

See the python-snappy Github repo for more information.

TL;DR;

You can install Snappy C library with following commands:

  • APT: sudo apt-get install libsnappy-dev

  • RPM: sudo yum install libsnappy-devel

  • Brew: brew install snappy

* Windows users should checkout Snappy for Windows and this Stackoverflow post for help on installing Snappy and python-snappy.

Known Limitation

PyStore currently only offers support for local filesystem. I plan on adding support for Amazon S3 (via s3fs), Google Cloud Storage (via gcsfs) and Hadoop Distributed File System (via hdfs3) in the future.

Acknowledgements

PyStore is hugely inspired by Man AHL’s Arctic which uses MongoDB for storage and allow for versioning and other features. I highly reommend you check it out.

License

PyStore is licensed under the GNU Lesser General Public License v2.1. A copy of which is included in LICENSE.txt.


I’m very interested in your experience with PyStore. Please drop me an note with any feedback you have.

Contributions welcome!

- Ran Aroussi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyStore-0.0.6.tar.gz (15.0 kB view details)

Uploaded Source

File details

Details for the file PyStore-0.0.6.tar.gz.

File metadata

  • Download URL: PyStore-0.0.6.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for PyStore-0.0.6.tar.gz
Algorithm Hash digest
SHA256 5a6e0a8d8d16000a476adf659a07c91df694c86b8a140d7ba4052c498c758d81
MD5 a1367b850ebac9df297925d28cf3a5a5
BLAKE2b-256 26069709757fc288ce2631e7f0039ec925edf1fa06bfeb445da5abae815b2cfb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page