Skip to main content

AHL Research Versioned TimeSeries and Tick store

Project description

Circle CI Travis CI Coverage Status Join the chat at https://gitter.im/manahl/arctic

Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-the-box, with pluggable support for other data types and optional versioning.

Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.

Arctic has been under active development at Man AHL since 2012.

Quickstart

Install Arctic

pip install git+https://github.com/manahl/arctic.git

Run a MongoDB

mongod --dbpath <path/to/db_directory>

Using VersionStore

from arctic import Arctic
import quandl

# Connect to Local MONGODB
store = Arctic('localhost')

# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')

# Access the library
library = store['NASDAQ']

# Load some data - maybe from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})

# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata

VersionStore supports much more: See the HowTo!

Adding your own storage engine

Plugging a custom class in as a library type is straightforward. This example shows how.

Concepts

Libraries

Arctic provides namespaced libraries of data. These libraries allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.).

Arctic supports multiple data libraries per user. A user (or namespace) maps to a MongoDB database (the granularity of mongo authentication). The library itself is composed of a number of collections within the database. Libraries look like:

  • user.EOD

  • user.ONEMINUTE

A library is mapped to a Python class. All library databases in MongoDB are prefixed with ‘arctic_’

Storage Engines

Arctic includes three storage engines:

  • VersionStore: a key-value versioned TimeSeries store. It supports:

    • Pandas data types (other Python types pickled)

    • Multiple versions of each data item. Can easily read previous versions.

    • Create point-in-time snapshots across symbols in a library

    • Soft quota support

    • Hooks for persisting other data types

    • Audited writes: API for saving metadata and data before and after a write.

    • a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars

    • See the HowTo

  • TickStore: Column oriented tick database. Supports dynamic fields, chunks aren’t versioned. Designed for large continuously ticking data.

  • Chunkstore: A storage type that allows data to be stored in customizable chunk sizes. Chunks aren’t versioned, and can be appended to and updated in place.

Arctic storage implementations are pluggable. VersionStore is the default.

Requirements

Arctic currently works with:

  • Python 2.7, 3.4, 3.5, 3.6

  • pymongo >= 3.0

  • Pandas

  • MongoDB >= 2.4.x

Operating Systems: * Linux * macOS

Acknowledgements

Arctic has been under active development at Man AHL since 2012.

It wouldn’t be possible without the work of the AHL Data Engineering Team including:

Contributions welcome!

License

Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE

Changelog

1.49 (2017-08-02)

  • Feature: #392 MetadataStore

  • Bugfix: #384 sentinels missing time data on chunk start/ends in ChunkStore

  • Bugfix: #382 Remove dependency on cython being pre-installed

  • Bugfix: #343 Renaming libraries/collections within a namespace/database

1.48 (2017-06-26)

  • BugFix: Rollback #363, as it breaks multi-index dataframe

  • Bugfix: #372 OSX build improvements

1.47 (2017-06-19)

  • Feature: Re-introduce #363 concat flag, essentially undo-ing 1.45

  • BugFix: #377 Fix broken replace_one on BSONStore and add bulk_write

1.46 (2017-06-13)

  • Feature: #374 Shard BSONStore on _id rather than symbol

1.45 (2017-06-09)

  • BugFix: Rollback #363, which can cause ordering issues on append

1.44 (2017-06-08)

  • Feature: #364 Expose compressHC from internal arctic LZ4 and remove external LZ4 dependency

  • Feature: #363 Appending older data (compare to what’s exist in library) will raise. Use concat=True to append only the new bits

  • Feature: #371 Expose more functionality in BSONStore

1.43 (2017-05-30)

  • Bugfix: #350 remove deprecated pandas calls

  • Bugfix: #360 version incorrect in empty append in VersionStore

  • Feature: #365 add generic BSON store

1.42 (2017-05-12)

  • Bugfix: #346 fixed daterange subsetting error on very large dateframes in version store

  • Bugfix: #351 $size queries can’t use indexes, use alternative queries

1.41 (2017-04-20)

  • Bugfix: #334 Chunk range param with pandas object fails in chunkstore.get_chunk_ranges

  • Bugfix: #339 Depending on lz4<=0.8.2 to fix build errors

  • Bugfix: #342 fixed compilation errors on Mac OSX

  • Bugfix: #344 fixed data corruption problem with concurrent appends

1.40 (2017-03-03)

  • BugFix: #330 Make Arctic._lock reentrant

1.39 (2017-03-03)

  • Feature: #329 Add reset() method to Arctic

1.38 (2017-02-22)

  • Bugfix: #324 Datetime indexes must be sorted in chunkstore

  • Feature: #290 improve performance of tickstore column reads

1.37 (2017-1-31)

  • Bugfix: #300 to_datetime deprecated in pandas, use to_pydatetime instead

  • Bugfix: #309 formatting change for DateRange __str__

  • Feature: #313 set and read user specified metadata in chunkstore

  • Feature: #319 Audit log support in ChunkStor

  • Bugfix: #216 Tickstore write fails with named index column

1.36 (2016-12-13)

  • Feature: Default to hashed based sharding

  • Bugfix: retry socket errors during VersionStore snapshot operations

1.35 (2016-11-29)

  • Bugfix: #296 Cannot compress/decompress empty string

1.34 (2016-11-29)

  • Feature: #294 Move per-chunk metadata for chunkstore to a separate collection

  • Bugfix: #292 Account for metadata size during size chunking in ChunkStore

  • Feature: #283 Support for all pandas frequency strings in ChunkStore DateChunker

  • Feature: #286 Add has_symbol to ChunkStore and support for partial symbol matching in list_symbols

1.33 (2016-11-07)

  • Feature: #275 Tuple range object support in DateChunker

  • Bugfix: #273 Duplicate columns breaking serializer

  • Feature: #267 Tickstore.delete returns deleted data

  • Dependency: #266 Remove pytest-dbfixtures in favor of pytest-server-fixtures

1.32 (2016-10-25)

  • Feature: #260 quota support on Chunkstore

  • Bugfix: #259 prevent write of unnamed columns/indexes

  • Bugfix: #252 pandas 0.19.0 compatibility fixes

  • Bugfix: #249 open ended range reads on data without index fail

  • Bugfix: #262 VersionStore.append must check data is written correctly during repack

  • Bugfix: #263 Quota: Improve the error message when near soft-quota limit

  • Perf: #265 VersionStore.write / append don’t aggressively add indexes on each write

1.31 (2016-09-29)

  • Bugfix: #247 segmentation read fix in chunkstore

  • Feature: #243 add get_library_type method

  • Bugfix: more cython changes to handle LZ4 errors properly

  • Feature: #239 improve chunkstore’s get_info method

1.30 (2016-09-26)

  • Feature: #235 method to return chunk ranges on a symbol in ChunkStore

  • Feature: #234 Iterator access to ChunkStore

  • Bugfix: #236 Cython not handling errors from LZ4 function calls

1.29 (2016-09-20)

  • Bugfix: #228 Mongo fail-over during append can leave a Version in an inconsistent state

  • Feature: #193 Support for different Chunkers and Serializers by symbol in ChunkStore

  • Feature: #220 Raise exception if older version of arctic attempts to read unsupported pickled data

  • Feature: #219 and #220 Support for pickling large data (>2GB)

  • Feature: #204 Add support for library renaming

  • Feature: #209 Upsert capability in ChunkStore’s update method

  • Feature: #207 Support DatetimeIndexes in DateRange chunker

  • Bugfix: #232 Don’t raise during VersionStore #append(…) if the previous append failed

1.28 (2016-08-16)

  • Bugfix: #195 Top level tickstore write with list of dicts now works with timezone aware datetimes

1.27 (2016-08-05)

  • Bugfix: #187 Compatibility with latest version of pytest-dbfixtures

  • Feature: #182 Improve ChunkStore read/write performance

  • Feature: #162 Rename API for ChunkStore

  • Feature: #186 chunk_range on update

  • Bugfix: #189 range delete does not update symbol metadata

1.26 (2016-07-20)

  • Bugfix: Faster TickStore querying for multiple symbols simultaneously

  • Bugfix: TickStore.read now respects allow_secondary=True

  • Bugfix: #147 Add get_info method to ChunkStore

  • Bugfix: Periodically re-cache the library.quota to pick up any changes

  • Bugfix: #166 Add index on SHA for ChunkStore

  • Bugfix: #169 Dtype mismatch in chunkstore updates

  • Feature: #171 allow deleting of values within a date range in ChunkStore

  • Bugfix: #172 Fix date range bug when querying dates in the middle of chunks

  • Bugfix: #176 Fix overwrite failures in Chunkstore

  • Bugfix: #178 - Change how start/end dates are populated in the DB, also fix append so it works as expected.

  • Bugfix: #43 - Remove dependency on hardcoded Linux timezone files

1.25 (2016-05-23)

  • Bugfix: Ensure that Tickstore.write doesn’t allow out of order messages

  • Bugfix: VersionStore.write now allows writing ‘None’ as a value

1.24 (2016-05-10)

  • Bugfix: Backwards compatibility reading/writing documents with previous versions of Arctic

1.22 (2016-05-09)

  • Bugfix: #109 Ensure stable sort during Arctic read

  • Feature: New benchmark suite using ASV

  • Bugfix: #129 Fixed an issue where some chunks could get skipped during a multiple-symbol TickStore read

  • Bugfix: #135 Fix issue with different datatype returned from pymongo in python3

  • Feature: #130 New Chunkstore storage type

1.21 (2016-03-08)

  • Bugfix: #106 Fix Pandas Panel storage for panels with different dimensions

1.20 (2016-02-03)

  • Feature: #98 Add initial_image as optional parameter on tickstore write()

  • Bugfix: #100 Write error on end field when writing with pandas dataframes

1.19 (2016-01-29)

  • Feature: Add python 3.3/3.4 support

  • Bugfix: #95 Fix raising NoDataFoundException across multiple low level libraries

1.18 (2016-01-05)

  • Bugfix: #81 Fix broken read of multi-index DataFrame written by old version of Arctic

  • Bugfix: #49 Fix strifying tickstore

1.17 (2015-12-24)

  • Feature: Add timezone suppport to store multi-index dataframes

  • Bugfix: Fixed broken sdist releases

1.16 (2015-12-15)

  • Feature: ArticTransaction now supports non-audited ‘transactions’: audit=False with ArcticTransaction(Arctic('hostname')['some_library'], 'symbol', audit=False) as at: ... This is useful for batch jobs which read-modify-write and don’t want to clash with concurrent writers, and which don’t require keeping all versions of a symbol.

1.15 (2015-11-25)

  • Feature: get_info API added to version_store.

1.14 (2015-11-25)

1.12 (2015-11-12)

  • Bugfix: correct version detection for Pandas >= 0.18.

  • Bugfix: retrying connection initialisation in case of an AutoReconnect failure.

1.11 (2015-10-29)

  • Bugfix: Improve performance of saving multi-index Pandas DataFrames by 9x

  • Bugfix: authenticate should propagate non-OperationFailure exceptions (e.g. ConnectionFailure) as this might be indicative of socket failures

  • Bugfix: return ‘deleted’ state in VersionStore.list_versions() so that callers can pick up on the head version being the delete-sentinel.

1.10 (2015-10-28)

  • Bugfix: VersionStore.read(date_range=…) could do the wrong thing with TimeZones (which aren’t yet supported for date_range slicing.).

1.9 (2015-10-06)

  • Bugfix: fix authentication race condition when sharing an Arctic instance between multiple threads.

1.8 (2015-09-29)

  • Bugfix: compatibility with both 3.0 and pre-3.0 MongoDB for querying current authentications

1.7 (2015-09-18)

  • Feature: Add support for reading a subset of a pandas DataFrame in VersionStore.read by passing in an arctic.date.DateRange

  • Bugfix: Reauth against admin if not auth’d against a library a specific library’s DB. Sometimes we appear to miss admin DB auths. This is to workaround that until we work out what the issue is.

1.6 (2015-09-16)

  • Feature: Add support for multi-index Bitemporal DataFrame storage. This allows persisting data and changes within the DataFrame making it easier to see how old data has been revised over time.

  • Bugfix: Ensure we call the error logging hook when exceptions occur

1.5 (2015-09-02)

  • Always use the primary cluster node for ‘has_symbol()’, it’s safer

1.4 (2015-08-19)

  • Bugfixes for timezone handling, now ensures use of non-naive datetimes

  • Bugfix for tickstore read missing images

1.3 (2015-08-011)

  • Improvements to command-line control scripts for users and libraries

  • Bugfix for pickling top-level Arctic object

1.2 (2015-06-29)

  • Allow snapshotting a range of versions in the VersionStore, and snapshot all versions by default.

1.1 (2015-06-16)

  • Bugfix for backwards-compatible unpickling of bson-encoded data

  • Added switch for enabling parallel lz4 compression

1.0 (2015-06-14)

  • Initial public release

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arctic-1.49.0.tar.gz (464.8 kB view details)

Uploaded Source

Built Distributions

arctic-1.49.0-py2.7-linux-x86_64.egg (314.4 kB view details)

Uploaded Source

arctic-1.49.0-cp36-cp36m-win_amd64.whl (176.2 kB view details)

Uploaded CPython 3.6m Windows x86-64

arctic-1.49.0-cp36-cp36m-manylinux1_x86_64.whl (514.1 kB view details)

Uploaded CPython 3.6m

arctic-1.49.0-cp36-cp36m-macosx_10_7_x86_64.whl (321.1 kB view details)

Uploaded CPython 3.6m macOS 10.7+ x86-64

arctic-1.49.0-cp35-cp35m-manylinux1_x86_64.whl (511.8 kB view details)

Uploaded CPython 3.5m

arctic-1.49.0-cp35-cp35m-macosx_10_7_x86_64.whl (320.2 kB view details)

Uploaded CPython 3.5m macOS 10.7+ x86-64

arctic-1.49.0-cp27-cp27mu-manylinux1_x86_64.whl (495.9 kB view details)

Uploaded CPython 2.7mu

arctic-1.49.0-cp27-cp27m-win_amd64.whl (182.4 kB view details)

Uploaded CPython 2.7m Windows x86-64

arctic-1.49.0-cp27-cp27m-manylinux1_x86_64.whl (495.9 kB view details)

Uploaded CPython 2.7m

arctic-1.49.0-cp27-cp27m-macosx_10_7_x86_64.whl (309.6 kB view details)

Uploaded CPython 2.7m macOS 10.7+ x86-64

File details

Details for the file arctic-1.49.0.tar.gz.

File metadata

  • Download URL: arctic-1.49.0.tar.gz
  • Upload date:
  • Size: 464.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for arctic-1.49.0.tar.gz
Algorithm Hash digest
SHA256 6e531efec7bb433ecaaf864c902855311ac6f9bbf1b8b18791707404a90c16c6
MD5 dd66a74e6e068aef281aa5857a3e0529
BLAKE2b-256 6afcb22212a032122b9ee2cfe156544fd767f810172ad1523fca36202e7f6571

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-py2.7-linux-x86_64.egg.

File metadata

File hashes

Hashes for arctic-1.49.0-py2.7-linux-x86_64.egg
Algorithm Hash digest
SHA256 83e253636fd670ab9edb00be721b95a381ced204b5e4f33b30af4ec2b7e0e5a4
MD5 3d7df96b8801285fa23a120b7fdf0fd1
BLAKE2b-256 73380863837ac2c060b0ecb58f25e010f9762dc261e6b41a5a3f6beb104e07f8

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 53a7043bc02d77512928e8d57c319b5eb95daa23dc0d4959bccf2d67009e0ecd
MD5 a21ce7a9e9731c7df53c4ccbbfbeefc1
BLAKE2b-256 9ab1165217a88e5083df8e9aabfc6b6b067797cc1cd752832786f26bba886eba

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 00a9b503dec9076f78ea7747f15decb71027ba4998b04fdb58c54bf76e117406
MD5 fd29449e9172800434a72b8a13cb22eb
BLAKE2b-256 fa10788237fce309f66cd2e1e6747ac2e7257be48982e9d59f4cfa4abb043b6c

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp36-cp36m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp36-cp36m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 1be4d3b51857b37000ce8c0e17a19fb2573626a415e9f01930fd82ab51a8d638
MD5 53e3cab2a10a20d666d258cdb42266be
BLAKE2b-256 f8fcaae170d1cdf966ca38903ff3f39be92bfb955f646186de5674f4bc3a59f8

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 507064b30d7be73b0d6fafbfb59c740b3965272a0996bce1533e351f669a4723
MD5 016765485b812e554dd26784e415fd01
BLAKE2b-256 7c1e568fe46ebd954f478b77f36534e47e6b8691dce55662e52d0d363d2aa40e

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp35-cp35m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp35-cp35m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 f2fa3c0667cb5419d318b1629870b22868d627f521248ae2295a0593b3ab1598
MD5 a351842f3bdbaae212b6c0b26a25dd69
BLAKE2b-256 c488453588a9c7940cba1d54db9f7f356c9d5af81522b19f13833ae9fd64da7b

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d2db1bb9de204fbe9b8b1c190f2aeedaf423c04ef22cd063b358ec3d80acff34
MD5 d604fddc1968541099fa1ffd840e6fbf
BLAKE2b-256 89caee138b4cb2523ffff33d674d806ffe3d787720d3e3e6ebcf0c474bcead6f

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 dce43a49b1611e29590ca2d5f1fc9a1885e2de1782f0f1debeeeba688444c5e2
MD5 93f9075884c51af21f7553ede6567181
BLAKE2b-256 5109e0ca547e276203c5243f3cfcec3c88955efc038eed45a440ceeaa2a2680a

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f81047d3c5ebe3ec294fd534169e8d2ac93fbc550dbabbd9d4d86191e7b24810
MD5 6cde5980a6c86b3049660ed06365fd3a
BLAKE2b-256 347cdfa1b08f6756853e7d23e033471478afb6c26e2c816e9356a7fdd74f36f9

See more details on using hashes here.

File details

Details for the file arctic-1.49.0-cp27-cp27m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for arctic-1.49.0-cp27-cp27m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 738fb8059f0aaf4ef1f9e57088edcd5614f1bc985918a9f4f631d4f6434a563e
MD5 b122d03ad0379b1335fde06736a6986e
BLAKE2b-256 c4f3233164e337ee60371fabd5286bd1b2ac4e2e36a74dde14c4c028ed30beef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page