Skip to main content

Tools for analyzing hierarchically nested, columnar data without deserialization.

Project description

Data analysts are often faced with a choice between speed and flexibility. Tabular data, such as SQL tables, can be processed rapidly enough for a truly interactive analysis session, but hierarchically nested data, such as JSON, is better at representing relationships within complex data models. In some domains (such as particle physics), we want to perform calculations on JSON-like structures at the speed of SQL.

The key to high throughput for large datasets, particularly ones with many more attributes than are typically accessed in a single pass, is laying out the data in “columns.” All values of an attribute should be contiguous in disk or memory because data are paged from one cache to the next in locally contiguous blocks. The ROOT and Parquet file formats represent hierarchically nested data in a columnar form on disk, and Apache Arrow is an emerging standard for sharing hierarchically nested data in memory. However, data from these formats are usually deserialized into conventional data structures before processing, which limits performance (see this talk and this paper).

OAMap is a toolset for computing arbitrary functions on hierarchical, columnar data without deserialization. The name stands for Object Array Mapping, in analogy with the Object Relational Mapping (ORM) interface to some databases. Users (data analysts) write functions on JSON-like objects that OAMap compiles into operations on the underlying arrays, similar to the way that ORM converts object-oriented code into SQL. The difference is that mapping objects to non-relational arrays permits bare-metal performance (giving up some traditional database features).

OAMap is a Python library on top of which high-level analysis software may be built. It focuses on mapping an object-oriented view of data onto columnar arrays. Numpy is OAMap’s only strict dependency, though OAMap objects are also implemented as Numba extensions, so they may be used in Numba’s JIT-compiled functions at speeds that match or exceed hand-written C code. OAMap is unopinionated about the source of its columnar arrays, allowing for a variety of backends. See the walkthrough for more.

Also, a similar object array mapping could be implemented in any language— Python was chosen only for its popularity among data analysts.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oamap-0.3.2.tar.gz (49.1 kB view details)

Uploaded Source

File details

Details for the file oamap-0.3.2.tar.gz.

File metadata

  • Download URL: oamap-0.3.2.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for oamap-0.3.2.tar.gz
Algorithm Hash digest
SHA256 cb535d36f3e783d8f867e953b11a0832abb593d744512bfcbd98761f0eb6aa2a
MD5 344a487e1c0a415433a68a2c5bb0caef
BLAKE2b-256 4d43dcee7e302a469db008c21ff4f04d3fe252893c7b80435739ba227d306ba8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page