Skip to main content

AHL Research Versioned TimeSeries and Tick store

Project description

Circle CI Travis CI Coverage Status Join the chat at https://gitter.im/manahl/arctic

Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-the-box, with pluggable support for other data types and optional versioning.

Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.

Arctic has been under active development at Man AHL since 2012.

Quickstart

Install Arctic

pip install git+https://github.com/manahl/arctic.git

Run a MongoDB

mongod --dbpath <path/to/db_directory>

Using VersionStore

from arctic import Arctic
import quandl

# Connect to Local MONGODB
store = Arctic('localhost')

# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')

# Access the library
library = store['NASDAQ']

# Load some data - maybe from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})

# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata

VersionStore supports much more: See the HowTo!

Adding your own storage engine

Plugging a custom class in as a library type is straightforward. This example shows how.

Concepts

Libraries

Arctic provides namespaced libraries of data. These libraries allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.).

Arctic supports multiple data libraries per user. A user (or namespace) maps to a MongoDB database (the granularity of mongo authentication). The library itself is composed of a number of collections within the database. Libraries look like:

  • user.EOD

  • user.ONEMINUTE

A library is mapped to a Python class. All library databases in MongoDB are prefixed with ‘arctic_’

Storage Engines

Arctic includes three storage engines:

  • VersionStore: a key-value versioned TimeSeries store. It supports:

    • Pandas data types (other Python types pickled)

    • Multiple versions of each data item. Can easily read previous versions.

    • Create point-in-time snapshots across symbols in a library

    • Soft quota support

    • Hooks for persisting other data types

    • Audited writes: API for saving metadata and data before and after a write.

    • a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars

    • See the HowTo

  • TickStore: Column oriented tick database. Supports dynamic fields, chunks aren’t versioned. Designed for large continuously ticking data.

  • Chunkstore: A storage type that allows data to be stored in customizable chunk sizes. Chunks aren’t versioned, and can be appended to and updated in place.

Arctic storage implementations are pluggable. VersionStore is the default.

Requirements

Arctic currently works with:

  • Python 2.7, 3.4, 3.5

  • pymongo >= 3.0

  • Pandas

  • MongoDB >= 2.4.x

Acknowledgements

Arctic has been under active development at Man AHL since 2012.

It wouldn’t be possible without the work of the AHL Data Engineering Team including:

Contributions welcome!

License

Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE

Changelog

1.28 (2016-08-16)

  • Bugfix: #195 Top level tickstore write with list of dicts now works with timezone aware datetimes

1.27 (2016-08-05)

  • Bugfix: #187 Compatibility with latest version of pytest-dbfixtures

  • Feature: #182 Improve ChunkStore read/write performance

  • Feature: #162 Rename API for ChunkStore

  • Feature: #186 chunk_range on update

  • Bugfix: #189 range delete does not update symbol metadata

1.26 (2016-07-20)

  • Bugfix: Faster TickStore querying for multiple symbols simultaneously

  • Bugfix: TickStore.read now respects allow_secondary=True

  • Bugfix: #147 Add get_info method to ChunkStore

  • Bugfix: Periodically re-cache the library.quota to pick up any changes

  • Bugfix: #166 Add index on SHA for ChunkStore

  • Bugfix: #169 Dtype mismatch in chunkstore updates

  • Feature: #171 allow deleting of values within a date range in ChunkStore

  • Bugfix: #172 Fix date range bug when querying dates in the middle of chunks

  • Bugfix: #176 Fix overwrite failures in Chunkstore

  • Bugfix: #178 - Change how start/end dates are populated in the DB, also fix append so it works as expected.

  • Bugfix: #43 - Remove dependency on hardcoded Linux timezone files

1.25 (2016-05-23)

  • Bugfix: Ensure that Tickstore.write doesn’t allow out of order messages

  • Bugfix: VersionStore.write now allows writing ‘None’ as a value

1.24 (2016-05-10)

  • Bugfix: Backwards compatibility reading/writing documents with previous versions of Arctic

1.22 (2016-05-09)

  • Bugfix: #109 Ensure stable sort during Arctic read

  • Feature: New benchmark suite using ASV

  • Bugfix: #129 Fixed an issue where some chunks could get skipped during a multiple-symbol TickStore read

  • Bugfix: #135 Fix issue with different datatype returned from pymongo in python3

  • Feature: #130 New Chunkstore storage type

1.21 (2016-03-08)

  • Bugfix: #106 Fix Pandas Panel storage for panels with different dimensions

1.20 (2016-02-03)

  • Feature: #98 Add initial_image as optional parameter on tickstore write()

  • Bugfix: #100 Write error on end field when writing with pandas dataframes

1.19 (2016-01-29)

  • Feature: Add python 3.3/3.4 support

  • Bugfix: #95 Fix raising NoDataFoundException across multiple low level libraries

1.18 (2016-01-05)

  • Bugfix: #81 Fix broken read of multi-index DataFrame written by old version of Arctic

  • Bugfix: #49 Fix strifying tickstore

1.17 (2015-12-24)

  • Feature: Add timezone suppport to store multi-index dataframes

  • Bugfix: Fixed broken sdist releases

1.16 (2015-12-15)

  • Feature: ArticTransaction now supports non-audited ‘transactions’: audit=False with ArcticTransaction(Arctic('hostname')['some_library'], 'symbol', audit=False) as at: ... This is useful for batch jobs which read-modify-write and don’t want to clash with concurrent writers, and which don’t require keeping all versions of a symbol.

1.15 (2015-11-25)

  • Feature: get_info API added to version_store.

1.14 (2015-11-25)

1.12 (2015-11-12)

  • Bugfix: correct version detection for Pandas >= 0.18.

  • Bugfix: retrying connection initialisation in case of an AutoReconnect failure.

1.11 (2015-10-29)

  • Bugfix: Improve performance of saving multi-index Pandas DataFrames by 9x

  • Bugfix: authenticate should propagate non-OperationFailure exceptions (e.g. ConnectionFailure) as this might be indicative of socket failures

  • Bugfix: return ‘deleted’ state in VersionStore.list_versions() so that callers can pick up on the head version being the delete-sentinel.

1.10 (2015-10-28)

  • Bugfix: VersionStore.read(date_range=…) could do the wrong thing with TimeZones (which aren’t yet supported for date_range slicing.).

1.9 (2015-10-06)

  • Bugfix: fix authentication race condition when sharing an Arctic instance between multiple threads.

1.8 (2015-09-29)

  • Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for querying current authentications

1.7 (2015-09-18)

  • Feature: Add support for reading a subset of a pandas DataFrame in VersionStore.read by passing in an arctic.date.DateRange

  • Bugfix: Reauth against admin if not auth’d against a library a specific library’s DB. Sometimes we appear to miss admin DB auths. This is to workaround that until we work out what the issue is.

1.6 (2015-09-16)

  • Feature: Add support for multi-index Bitemporal DataFrame storage. This allows persisting data and changes within the DataFrame making it easier to see how old data has been revised over time.

  • Bugfix: Ensure we call the error logging hook when exceptions occur

1.5 (2015-09-02)

  • Always use the primary cluster node for ‘has_symbol()’, it’s safer

1.4 (2015-08-19)

  • Bugfixes for timezone handling, now ensures use of non-naive datetimes

  • Bugfix for tickstore read missing images

1.3 (2015-08-011)

  • Improvements to command-line control scripts for users and libraries

  • Bugfix for pickling top-level Arctic object

1.2 (2015-06-29)

  • Allow snapshotting a range of versions in the VersionStore, and snapshot all versions by default.

1.1 (2015-06-16)

  • Bugfix for backwards-compatible unpickling of bson-encoded data

  • Added switch for enabling parallel lz4 compression

1.0 (2015-06-14)

  • Initial public release

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arctic-1.28.0.tar.gz (448.0 kB view details)

Uploaded Source

Built Distributions

arctic-1.28.0-py2.7-linux-x86_64.egg (303.6 kB view details)

Uploaded Source

arctic-1.28.0-cp35-cp35m-manylinux1_x86_64.whl (464.4 kB view details)

Uploaded CPython 3.5m

arctic-1.28.0-cp27-cp27mu-manylinux1_x86_64.whl (450.0 kB view details)

Uploaded CPython 2.7mu

File details

Details for the file arctic-1.28.0.tar.gz.

File metadata

  • Download URL: arctic-1.28.0.tar.gz
  • Upload date:
  • Size: 448.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for arctic-1.28.0.tar.gz
Algorithm Hash digest
SHA256 bb59d2ff9936bb0195fe4140fe3c5802c67dff95a997b506217265d0ba3935c0
MD5 5af74896bab009616f61b93722849823
BLAKE2b-256 f51a0e471d53ee2bfc83c4607d7856a9a7a63e60fe5dc0255c51810b3f616a50

See more details on using hashes here.

File details

Details for the file arctic-1.28.0-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for arctic-1.28.0-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 b0dae55d19da6a94937d9b608324b225ea00e580dcacb9c474c1c5cc795712c4
MD5 7b52325815c6e21022ccaf522519567e
BLAKE2b-256 bb2e5793b819d4dc69580ef0daed190f6d06801cee7495b6b6d6ccb977ff7342

See more details on using hashes here.

File details

Details for the file arctic-1.28.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.28.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7aae197908de12a4803aadc22de799934690ec8f2890d040bb9b6d99c0ca1bd6
MD5 8fd8ef20b80f4d96022d05360382d074
BLAKE2b-256 218c84a9f20946df19d7f08a201942463ac864b6843d0fe307f631680bc0b21f

See more details on using hashes here.

File details

Details for the file arctic-1.28.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.28.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c528cd5e573b9fd67af04dcb16a6202a905158bc1ed005ca6874d4b9342677b5
MD5 6491cf1d062cb4867b3b985fedaeec1f
BLAKE2b-256 7f145cbeb6f56608c207d552ac3b54026bb630992142ac32f7dadb688b609368

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page