Skip to main content

Tools for analyzing hierarchically nested, columnar data without deserialization.

Project description

Data analysts are often faced with a choice between speed and flexibility. Tabular data, such as SQL tables, can be processed rapidly enough for a truly interactive analysis session, but hierarchically nested data, such as JSON, is better at representing relationships within complex data models. In some domains (such as particle physics), we want to perform calculations on JSON-like structures at the speed of SQL.

The key to high throughput for large datasets, particularly ones with many more attributes than are typically accessed in a single pass, is laying out the data in “columns.” All values of an attribute should be contiguous in disk or memory because data are paged from one cache to the next in locally contiguous blocks. The ROOT and Parquet file formats represent hierarchically nested data in a columnar form on disk, and Apache Arrow is an emerging standard for sharing hierarchically nested data in memory. However, data from these formats are usually deserialized into conventional data structures before processing, which limits performance (see this talk and this paper).

OAMap is a toolset for computing arbitrary functions on hierarchical, columnar data without deserialization. The name stands for Object Array Mapping, in analogy with the Object Relational Mapping (ORM) interface to some databases. Users (data analysts) write functions on JSON-like objects that OAMap compiles into operations on the underlying arrays, similar to the way that ORM converts object-oriented code into SQL. The difference is that mapping objects to non-relational arrays permits bare-metal performance (giving up some traditional database features).

OAMap is a Python library on top of which high-level analysis software may be built. It focuses on mapping an object-oriented view of data onto columnar arrays. Numpy is OAMap’s only strict dependency, though OAMap objects are also implemented as Numba extensions, so they may be used in Numba’s JIT-compiled functions at speeds that match or exceed hand-written C code. OAMap is unopinionated about the source of its columnar arrays, allowing for a variety of backends. See the walkthrough for more.

Also, a similar object array mapping could be implemented in any language— Python was chosen only for its popularity among data analysts.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oamap-0.10.4.tar.gz (79.6 kB view details)

Uploaded Source

File details

Details for the file oamap-0.10.4.tar.gz.

File metadata

  • Download URL: oamap-0.10.4.tar.gz
  • Upload date:
  • Size: 79.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for oamap-0.10.4.tar.gz
Algorithm Hash digest
SHA256 80c0e3678f9cf7fd98978a46f87c09061b8a775dbbb29e86f4025cfebb901078
MD5 fe1fe382ded1d95788ccf2fbb07df4e5
BLAKE2b-256 72f28ad77b242ace774455acf97bd2a58556efe971a58458196fe10146af482f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page