Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None)
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.6.tar.gz (122.0 kB view details)

Uploaded Source

Built Distributions

arcae-0.2.6-cp312-cp312-manylinux_2_28_x86_64.whl (32.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

arcae-0.2.6-cp312-cp312-manylinux_2_28_aarch64.whl (29.9 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

arcae-0.2.6-cp312-cp312-macosx_12_0_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.12 macOS 12.0+ x86-64

arcae-0.2.6-cp312-cp312-macosx_12_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.12 macOS 12.0+ ARM64

arcae-0.2.6-cp311-cp311-manylinux_2_28_x86_64.whl (32.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

arcae-0.2.6-cp311-cp311-manylinux_2_28_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

arcae-0.2.6-cp311-cp311-macosx_12_0_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

arcae-0.2.6-cp311-cp311-macosx_12_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

arcae-0.2.6-cp310-cp310-manylinux_2_28_x86_64.whl (32.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

arcae-0.2.6-cp310-cp310-manylinux_2_28_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

arcae-0.2.6-cp310-cp310-macosx_12_0_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

arcae-0.2.6-cp310-cp310-macosx_12_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

File details

Details for the file arcae-0.2.6.tar.gz.

File metadata

  • Download URL: arcae-0.2.6.tar.gz
  • Upload date:
  • Size: 122.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for arcae-0.2.6.tar.gz
Algorithm Hash digest
SHA256 daffb41cdadeebc2730564b09637fea0f6c059101bffda308c89ac7d8d68ec10
MD5 e6653920e5b4f12025d9a54711eafb30
BLAKE2b-256 b1d852c164bec1faa2879994e86eacf23c92fd5b4b7f5931a88e41ea83354692

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4d602c445a2cfc7983c10c4d4c4bd3811e44eed95eb822b6510198ef5ce79c92
MD5 512893ee78c91d88a57bb96ef714f5c9
BLAKE2b-256 f5f8da5efc0e3f49a0179310c5acc9e746c34397c5161603c84e37b4ef63548a

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9d36cfd75693708dc5f33a70058656beed3bf5bcec5c660032e104aca6c6c30d
MD5 b36a3ff2b071b2ff6d44ccceabd85579
BLAKE2b-256 7a6b1d0e89a3ecb232012d8daab6ebc4f0a612150ef8f513066eb2fd3f4c8809

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp312-cp312-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp312-cp312-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 5ae91a3779bd26080c1e33ed97faece3ace72f37b75550b74af0fe64847aea12
MD5 da3d8a604bd622db54ea28fbd6bcc1cd
BLAKE2b-256 5e68268f19b6ff16da82216e6ede183835f84dd41d1bee72266f560e3ed7b751

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp312-cp312-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp312-cp312-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 e485593997188483d44e27ded56f6a23e52e65c7162375979f96c8c884d83b21
MD5 2d6bd327356fa1e7922a0c63b85c7afb
BLAKE2b-256 d60b584f7a07b565ab8c4c78a44bba703c42b1f64ecd4ce21d1a9bdf5bc91f86

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9e1200046172ec61a0b3a72ab57204b801e5d44057987a9c5394565012dc5aeb
MD5 e15cf2a7c362a00b3805a932aefc79bd
BLAKE2b-256 6e5326d74929f15167255a543d908d31244b212eeba18d95cc303bfdbae6266d

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 aa207499c83a9682a0b28372809831ee6ff4fb389b0a973d09b8497f5d9e707e
MD5 11fa6212a3d727e4e1f0e05947236f16
BLAKE2b-256 080082ae7a8d0291636d4ebebafc864ad6fdd6381deff3e256a5cf9b35b3d426

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 5ef13cb2da3ca06e1312eb5a31203a92252eaa2b1228d08d51fff5b53102ab46
MD5 899b038a3141383df3ec25ffaa5c1665
BLAKE2b-256 8c3a7f48ba82edd7208756b426f737bd57daa33488028fe9dbb4b10fbc6c7493

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c6467f4a4e3b2800e23d28d4f32b9c101bc221e55907bc10a502c7a8eb574ebf
MD5 e5ade1b4ed7641b14870a9d4f7e6d916
BLAKE2b-256 2240d751d97b94d211efeaedf087208b1bb33ba3bb85aa30a16b5b43ecc14b73

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4de47e5d28cf708b358df1486984c7da0733b3f630da9b361823d1a1451e5da2
MD5 6856e047df68e187bb8eb3c735624ce2
BLAKE2b-256 4f1b14c642a1d9045ba76497038026c4a8d72f9d29ec9ea032390ffa4f2a7015

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1e88dcfb24a2b3de7c165b06349b1133f3da4565cd29b83647dcb33411ce4ba0
MD5 cb775a269565b47131e2ac0f7826ab33
BLAKE2b-256 41e2cb490ba3e877240c063eac43b47ce4756af1a2c818818dbcecbad1920046

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 fa08ec794d1299e01bf464afca0ff1921ee8a7d5c599831112a5ad4555e2d25c
MD5 62777df565e941fbe34925bad1847350
BLAKE2b-256 7232ff451644c57ddfd48a7080109a5c343b4b730321f05d1d104ae62ef73174

See more details on using hashes here.

File details

Details for the file arcae-0.2.6-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.6-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 36330a74a3539b57f098301fb69a756809fab82113145cbb26b0ad2b9400b051
MD5 ee3acde3ba9a0282cb8daaf8d6ee3cef
BLAKE2b-256 d248d20c49a2aaa18f8614fcd34a4e15684d8e9d0196122202482a3264a1d8b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page