Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threadded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometric routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repitition intendead here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol funcions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None)
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.3.tar.gz (79.4 kB view details)

Uploaded Source

Built Distributions

arcae-0.2.3-cp312-cp312-manylinux_2_28_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

arcae-0.2.3-cp312-cp312-manylinux_2_28_aarch64.whl (21.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

arcae-0.2.3-cp312-cp312-macosx_11_0_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.12 macOS 11.0+ x86-64

arcae-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

arcae-0.2.3-cp311-cp311-manylinux_2_28_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

arcae-0.2.3-cp311-cp311-manylinux_2_28_aarch64.whl (21.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

arcae-0.2.3-cp311-cp311-macosx_11_0_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.11 macOS 11.0+ x86-64

arcae-0.2.3-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

arcae-0.2.3-cp310-cp310-manylinux_2_28_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

arcae-0.2.3-cp310-cp310-manylinux_2_28_aarch64.whl (21.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

arcae-0.2.3-cp310-cp310-macosx_11_0_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

arcae-0.2.3-cp310-cp310-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

arcae-0.2.3-cp39-cp39-manylinux_2_28_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

arcae-0.2.3-cp39-cp39-manylinux_2_28_aarch64.whl (21.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

arcae-0.2.3-cp39-cp39-macosx_11_0_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

arcae-0.2.3-cp39-cp39-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file arcae-0.2.3.tar.gz.

File metadata

  • Download URL: arcae-0.2.3.tar.gz
  • Upload date:
  • Size: 79.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for arcae-0.2.3.tar.gz
Algorithm Hash digest
SHA256 2407e8b3cee5beba16dacff07cd8972999b4c49e24ea74499159e9d231e96089
MD5 acf0806cb7c089ee2f82fafbf26080f7
BLAKE2b-256 e2bbc6d73b9c4fd291b5addd3b7236f488a1b277f0b846e6727689b6ac8ab1c7

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c033a996bfd598d617a7d036cb26631e3d1f41fde06b11cc6457b5603bf0137e
MD5 8f9746fb5d5359250d6a1be6fdc7980e
BLAKE2b-256 515d363927a7f30c319fd23e5057f934f8e34ecc9bc79aa12f595f7db791b8b2

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fa467cbecb2444eae7d94dc2b1cdb9f13d8caecce8241a5f99df84225cee6c5e
MD5 795b07faed45fc386c3950f535e1f841
BLAKE2b-256 b6904b9b9f63e784ebcd25341cdbee13ac36cf80abd46773043951cd1d35d76d

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 33fbbf2a50f1eb07c0c712b266cb1d734dd6bfd897b6ce78b096271b1990b9df
MD5 bf34fd19b63a940e3e47805d31bc70fe
BLAKE2b-256 bf7b5f8b8c97c432b37e30310dc2ea2ecb1a7cb2b88584d1ba5fe73a59f22441

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eb362ae9e44f150b0e93c4a6363d5a1f4bfa01d52e0e9184b261e4000580570f
MD5 bec265c713af59ea3d0f9714b725f9af
BLAKE2b-256 012753b54bd2510b3d56da44b40eec24b6fdc1a5b6701adf2ad5c1260d742ad2

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c1bdec64c47a7306f88a380f2524a630c91b0cbc36d333025be0e7ccb68d12d9
MD5 5e34b0c3d980492ccb9524ad8969ea44
BLAKE2b-256 5c31220aa6f9cf8ace8881afd311b0cd98d7ed9a57d5fc0260f0d55290871f40

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2762321d43a40dab92ba01d7fb81bd00cda8b46b2316a2b2f34ea07b63dc4319
MD5 fabf0328d78bff83c4be412a894ca90e
BLAKE2b-256 07ac53fbf38d6b6b98a0923cf581abc0e8063bea922c6dec47490f3b00eaedfd

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 be0e0d4ab1a2ad43605fd6ee0888c12fbabf91089676ce281b21ff9272f97052
MD5 77a09ffe9a6797fde19a2e2806dac70f
BLAKE2b-256 69cac6d1135ee1d417c523435b4d9ad53d9463c37c01bff108c0f31c8a4b9557

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 924f4fd68dd9d7c483a89967223e5185a43dcdbffa67e1b1a6dc53d53db8c693
MD5 8165f9fe143e4788d279fc3219b3f513
BLAKE2b-256 7daa33ba85adf5dafbc206b3f819516c9d721e118c8d7f7a04c6de478c34aa9b

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7ce41b05d79b23be8733796c0158884dc3947bc5e4e4fa5f7b49565e05133c81
MD5 d1574ff08bebc0005624cf28a8e5f65f
BLAKE2b-256 a835fcb915ee4c3be83cf79de338bc91961074456379e233a2f7a608c4538479

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 66cdbd2ef77de220285f0547741b02e3a95a1ab56136e0fc02637b9189ed231d
MD5 8e486bd305037d9f6205ad2d6f54bd8a
BLAKE2b-256 d81839b840968645299f4045b1b0020660f7d91aa0bac33615d8d2497eaaa411

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 b9bf53aa4bce0c7aef486bcac50db23a9ba5249c7b581022ccadb09b56cb28f9
MD5 8545420ac3bfea58dd75e64df8c25f6b
BLAKE2b-256 23d404555c3e734550875ec0511b2e689e845535f22deb43b6724014bb1131be

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c075c4365da3d1050d670200d61640ffc90381f8a68f4c0fab297139d14bd3c
MD5 5766a6e866f0c2d16b3af21b85d3bb95
BLAKE2b-256 c1e4d34e0d1f2feaa49c3fbd2b236396fdfea2af707060dbdc8a0c8365a1afdf

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5849a3740b40e4d77a9718ad25f1617866f952fe7fb8c248a0155c4422d6a57f
MD5 3fa920239d7b4e67a03a7d36f3cd3170
BLAKE2b-256 76375571ae11556d9ede149fc7464111cc52f2b8a8570282a1ee3aa48109dbf3

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 04b5fa43372664448e2838a0f2ee158298db06f163409287d3cb228c8ae1a2e1
MD5 dbea8f3120e0b0f4a39ab24e5d744d93
BLAKE2b-256 b09e583e5fcbcaf73158341d17aabb6bb7f6727f9ac3af235c5c42ead1f90793

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 e643f8dda491d0315b98b6dd8801ea826ca3186c70391f726e0abceb2b990c41
MD5 01346d129523e09dd96eca9898d9cedb
BLAKE2b-256 00228445f2ded00758d01c245cc64d42ab7c2dd4fa153ae2a997fdbb216a55dd

See more details on using hashes here.

File details

Details for the file arcae-0.2.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c6e2c4cc0ae4905a9b2432af6553a76b72588ffed379f45eed7cb01993f082fd
MD5 788a2699128b98a530d1a72cbad297c5
BLAKE2b-256 3487e8b69a7d98783314f6d347db86a8ae8b6e2f85529e01d4c6db6ca7b48ccb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page