Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None)
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.4.tar.gz (84.6 kB view details)

Uploaded Source

Built Distributions

arcae-0.2.4-cp312-cp312-manylinux_2_28_x86_64.whl (32.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

arcae-0.2.4-cp312-cp312-manylinux_2_28_aarch64.whl (29.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

arcae-0.2.4-cp312-cp312-macosx_11_0_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ x86-64

arcae-0.2.4-cp312-cp312-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

arcae-0.2.4-cp311-cp311-manylinux_2_28_x86_64.whl (32.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

arcae-0.2.4-cp311-cp311-manylinux_2_28_aarch64.whl (29.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

arcae-0.2.4-cp311-cp311-macosx_11_0_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ x86-64

arcae-0.2.4-cp311-cp311-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

arcae-0.2.4-cp310-cp310-manylinux_2_28_x86_64.whl (32.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

arcae-0.2.4-cp310-cp310-manylinux_2_28_aarch64.whl (29.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

arcae-0.2.4-cp310-cp310-macosx_11_0_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

arcae-0.2.4-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

arcae-0.2.4-cp39-cp39-manylinux_2_28_x86_64.whl (32.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

arcae-0.2.4-cp39-cp39-manylinux_2_28_aarch64.whl (29.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

arcae-0.2.4-cp39-cp39-macosx_11_0_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

arcae-0.2.4-cp39-cp39-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file arcae-0.2.4.tar.gz.

File metadata

  • Download URL: arcae-0.2.4.tar.gz
  • Upload date:
  • Size: 84.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for arcae-0.2.4.tar.gz
Algorithm Hash digest
SHA256 7f575bad6d7cea21a201f82321f38effee40696a9fe85a30fa5801cc899f7bc4
MD5 5dc14865caf15ae6eafd7615715c3903
BLAKE2b-256 debefc9e0dc8e66b989592b5f2dbaa4a313b04d744efb341e2573f721fa4f576

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 efc556ff6859ce87cbcde27340f84f917670cf44ab8ea8a9c643ca3834216f83
MD5 fac17d92e8330b615eac804c51ed06eb
BLAKE2b-256 5b2efdc7295bb723fc5250c266458b0a909199376c9f9b9ab647e154c1cf3e20

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e6f431a0f822b7ee6755bdab27b75bf09de7a0566a341cd4adf893406efa18a6
MD5 9f1007c94019ff512bdf1a8273181768
BLAKE2b-256 e5d054b0d2e8c431f84ee51094997ff836ff9208608c962e2f9369c8453a6878

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 26d10b437398451b68c9f5b9ce8ff0d0529c1f11deddfca70e8152fed0c39f8f
MD5 f5ebccb79576923bff7ac8ba5579669b
BLAKE2b-256 b7fe1729e813ab827b183b91b13130df69fd8f7a2737d11f817e71832db8dc1c

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d6dacbaf7da01188188244dcc4d1a0740a17829a73e01672d1428854fe7452b
MD5 11df5bd43c0721aad19180f93135eedd
BLAKE2b-256 18e6e9d73bc15e08ec302e2cf6d70290d60d79229ca82ddfe54eb102658942b3

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b8c626d963203a1114fc7b5c835a4e45b79ce50c871a116a435b297f58bdc32a
MD5 ee74154c6cdbd366fc2dd8fcda86228c
BLAKE2b-256 bacc9d8bf809baef17fb0f4174533d8bf05fc96943b3a4f745a8f1b93d20c7ce

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 334af4ced87ebf1601c05e5b4f6896a4ee6ab09c28f3928b963eab83c12b26a8
MD5 5afdfc03c5c6d07e9bb9559d843ccbe3
BLAKE2b-256 7375d9c3deb34e6cf1c91f7ffb6ee9a7413a7f53a93b25d029e367d928bf577c

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 68cf578e3d2a0dcb471e4727bbe5ea8f71ba018782e5771cc3ad7a48c07a3e39
MD5 29a12a5e700d910bfa47e416977e6b76
BLAKE2b-256 addb33d12787d189b23c8a6fed71010926ee0e987b78ca47d918d2064c7e869c

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a7a28489d8889de9a895874eec65b4f50ddf1c5a3952e8e80b618bfa8e868bc1
MD5 ca99a7381886f82ff321d8878075507d
BLAKE2b-256 00a33277d43c0d3670746c9f69c53c549077c0e98778c6c23b053e06ebcd7c26

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 70ad7251e507e51a7baad9030739c4498eda58cad51aa5c1647d64264f865536
MD5 2b849144baf599e7ccd4e9b04b221004
BLAKE2b-256 6602e0e5a69d25b49a5b1d0299c07d81e9c63eae35633e2a10c8c3fc51877a85

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a4203d894c97c78cc388d5ade416514137cd8a4b09e30670e8bb5b571ddcd7f8
MD5 a01f4b16a086cf53c8b356781b200d9d
BLAKE2b-256 17bbfbe6e52ce82d137b16eef44cb2521e8c4aa390e51bed41af31140b2a0b5f

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 5bb26638b6d32449cf02575a9ebdb649cf1ef3743dea761d0af71b3b2930d870
MD5 fb8145bca2c95836af78128946a3cb90
BLAKE2b-256 a130e8901e5d3767dfe28a9324930b8ebe89850129a441742213cc634c07f90b

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4aec7ccd495ff8be670e92208f5dad11fe8e7a0a2ccf359afa2a4494397fbfe9
MD5 929e63ca8a9375f08c40aca8b5c7ff1a
BLAKE2b-256 8d347235df58de8f9dae3e130cbac019bbc2c29857dd1876991f7c2e43f7a29e

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 48a2a7a716e27352c5bdcfe36402c2947a8ca8c42f6fc01cbcfc343c217df93e
MD5 d53f069841ca58b89e73a5cee1e6479a
BLAKE2b-256 bbc19f5a33cf52265799ab63451bf931df363870b9863c034b68c5cfe45f91ac

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b1770b4ca1fa852cbaf8cb8a8ffb957f30edbb649a0724ba244402b82c239cfd
MD5 469b8e144725ebcaf92d4874c215b6c3
BLAKE2b-256 c9e40693806105d6494e8f01cbf618ee5d020e5036d63ce81bf27926fac214d4

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 53d157ade3c6bc60756b8e20734c7c193709dead305d7860e083822abbd7c8b4
MD5 798f84df59e423920c4ba318a7b1d110
BLAKE2b-256 bdff6a195dd0590968b4b5432a4f55d1f8ba9fbe32a7ea1411cbbe5d46f08a14

See more details on using hashes here.

File details

Details for the file arcae-0.2.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5615b96215079a08a7e47563d136f2afaaaf59cd72a2a1fc8d65c7f12763e750
MD5 703cbce8d9e3e843a2fa9c73bb96a725
BLAKE2b-256 c1f67f5e0d3b3e64534876f14d250addfd75016112d9e03975d45ab3c93772f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page