Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None)
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.5.tar.gz (89.0 kB view details)

Uploaded Source

Built Distributions

arcae-0.2.5-cp312-cp312-manylinux_2_28_x86_64.whl (32.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

arcae-0.2.5-cp312-cp312-manylinux_2_28_aarch64.whl (29.9 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

arcae-0.2.5-cp312-cp312-macosx_11_0_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.12 macOS 11.0+ x86-64

arcae-0.2.5-cp312-cp312-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

arcae-0.2.5-cp311-cp311-manylinux_2_28_x86_64.whl (32.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

arcae-0.2.5-cp311-cp311-manylinux_2_28_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

arcae-0.2.5-cp311-cp311-macosx_11_0_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ x86-64

arcae-0.2.5-cp311-cp311-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

arcae-0.2.5-cp310-cp310-manylinux_2_28_x86_64.whl (32.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

arcae-0.2.5-cp310-cp310-manylinux_2_28_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

arcae-0.2.5-cp310-cp310-macosx_11_0_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

arcae-0.2.5-cp310-cp310-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

File details

Details for the file arcae-0.2.5.tar.gz.

File metadata

  • Download URL: arcae-0.2.5.tar.gz
  • Upload date:
  • Size: 89.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for arcae-0.2.5.tar.gz
Algorithm Hash digest
SHA256 8775383aa1a309f57f19daeb71d0ba5bc8f9074d2db945260c007316579725cb
MD5 29846629e567f0dd78c8ae975ea3f31e
BLAKE2b-256 5f616ebeecbe4c009c3699b1c06ad1cc775be45b264b5cc34e439c46f44e5c93

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e9c06aa6ad4df7e7f5049fcc5b3ef0713846a50d479df6a60f2234827f6cb6bd
MD5 638d54ba8de248071cbfc461cf447765
BLAKE2b-256 0407114c62e0037bb8b7f926163f95c52027a2121a9df1ffd283b87dc56eae8f

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e4876caa90da0a9f17a40b3950187444042ff5766ef3c4add07c12c42b8c4103
MD5 12f2f2d996d4aa9fd8667d9a9a953543
BLAKE2b-256 72cf210df26853486d4ba83cabca686edb77090005bae2d5985321ade342ba9b

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 b68a1bda21e141d809ddc414eeabdfd6e5b52bce8541ee7bcb4d70a9ed9011aa
MD5 880c1bb7c2e16d9a666193a9e661b7ad
BLAKE2b-256 ba6085a84a9fd70f11fe3f3f3ef7e117f0ed5f6b6125ed17d1092980150f5614

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d31c2dae23dbca6b1daa660efd3746e82bd38e748ae5cd141f3e902ab17c4665
MD5 0e9046b8d2e559b92630db4a189a5bca
BLAKE2b-256 c4f3c57530a2e665078199cfaf45e25ba6e9dcd6f05f827b8735aa3b38f558f1

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2fc90b3943232761fca717a124aeea825928f67fa1b221c279dac4874bf34f22
MD5 a56f0a46a88be5aa66bad723998daae6
BLAKE2b-256 9b478959d55a4c25ef4307f85e6c3b85a729170cfe8da9fe12d8872d18bf48c4

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6c58e382f010dbee33b49abacf1d746f6a20e4310bc51d7d652dbdf358b945b0
MD5 33e765f6165cf15ec526840ce698d9f5
BLAKE2b-256 0b979baef37f6815b0ba722a61c91bd72d6b3a518de0b1b22bbd760180a7d434

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 d00c04ab9aa065968229b1997b735f7c41c6d85d26c1c6a12efa67420dbaa93a
MD5 b0e083c71669d59b375b09060cf39764
BLAKE2b-256 fde9078252fcc9a36ffa0ac6879e9f6923fbd975bb0706cee339fe1723ee013c

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa061b8efdccc2ab97679235f9ebc43f75bc02d2ca974b29af72e4d66b7e52e9
MD5 8b8d512969587cd6726d67690d2f84a9
BLAKE2b-256 e1241087f411e8289f5f147374c36a6444eab554ebd27f9548503d5255959dd9

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9165dd319d3452181053ac9bee27b2fbd76b91a663c62282cb53c9e472603805
MD5 d7bb8a1b94650fbf26b63557632c2a33
BLAKE2b-256 fe8e8a219eeb505a3a794d8c28ff79db4e7e71728408e52e4568f2b47952e784

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9a6e9606cdae699d38136a0fb8b70b3f268b08698aa48a3ba24cff514a37d7a5
MD5 29701ac3673df38b1ebb531983ed4be8
BLAKE2b-256 00fdb76a28fdb71b02ca454a9ef2834faa37cbe8a7c6614e3f4298362b8ce6a4

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 040117f3d137435fcadb826d7c4fc0b7b0f67734cc2336ff69a5b5796414450a
MD5 a82e74933fcac7233e806f745a880d50
BLAKE2b-256 477cf9d1c76e83e8c9b9c45c9cb65422f5e2131dd55be2b58b1b64c800546efe

See more details on using hashes here.

File details

Details for the file arcae-0.2.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1b2cc608342c7818afa4d46ccc8c08f4b35282900ff7b9eefebfe679905063ec
MD5 9fd6338fc639fbaee663d7b3679d59a4
BLAKE2b-256 2f45f4e62ec5ec933bd6f26e5dc2519e450edb5924ac1405078a9c5575d5cf09

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page