Arrow bindings for casacore
Project description
arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.
Rationale
casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:
Access from multiple threads is unsafe.
python-casacore doesn’t drop the Global Interpreter Lock
Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.
In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.
The Apache Arrow project defines a programming language portable in-memory columnar storage format.
Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.
It’s easy to convert Arrow Tables between many different languages
Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.
Converting CASA Tables to Arrow in the C++ layer avoids the GIL
Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread
It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.
Limitations
Arrow supports both 1D arrays and nested structures:
Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .
Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.
Complex values are represented as an extra FixedSizeListArray nesting of two floats.
Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).
Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:
Some other edge cases have not yet been implemented, but could be with some thought.
Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.
Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.
Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.
Installation
Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures
$ pip install arcae
Usage
Example usage with Arrow Tables:
import json from pprint import pprint import arcae import pyarrow as pa import pyarrow.parquet as pq # Obtain (partial) Apache Arrow Table from a CASA Table casa_table = arcae.table("/path/to/measurementset.ms") arrow_table = casa_table.to_arrow() # read entire table arrow_table = casa_table.to_arrow(index=(slice(10, 20),) assert isinstance(arrow_table, pa.Table) # Print JSON-encoded Table and Column keywords pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"])) pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"])) pq.write_table(arrow_table, "measurementset.parquet")
Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.
casa_table = arcae.table("/path/to/measurementset.ms", readonly=False) # Get rows 10 and 2, and channels 16 to 32, and all correlations data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None) # Write some modified data back casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)
See the test cases for further use cases.
Exporting Measurement Sets to Arrow Parquet Datasets
Install the applications optional extra.
pip install arcae[applications]
Then, an export script is available:
$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
│ └── data0.parquet
├── DATA_DESCRIPTION
│ └── data0.parquet
├── FEED
│ └── data0.parquet
├── FIELD
│ └── data0.parquet
├── MAIN
│ └── FIELD_ID=0
│ └── PROCESSOR_ID=0
│ ├── DATA_DESC_ID=0
│ │ ├── data0.parquet
│ │ ├── data1.parquet
│ │ ├── data2.parquet
│ │ └── data3.parquet
│ ├── DATA_DESC_ID=1
│ │ ├── data0.parquet
│ │ ├── data1.parquet
│ │ ├── data2.parquet
│ │ └── data3.parquet
│ ├── DATA_DESC_ID=2
│ │ ├── data0.parquet
│ │ ├── data1.parquet
│ │ ├── data2.parquet
│ │ └── data3.parquet
│ └── DATA_DESC_ID=3
│ ├── data0.parquet
│ ├── data1.parquet
│ ├── data2.parquet
│ └── data3.parquet
├── OBSERVATION
│ └── data0.parquet
This data can be loaded into an Arrow Dataset:
>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")
Etymology
Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)
Pronounced: ar-ki.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file arcae-0.2.4.tar.gz
.
File metadata
- Download URL: arcae-0.2.4.tar.gz
- Upload date:
- Size: 84.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f575bad6d7cea21a201f82321f38effee40696a9fe85a30fa5801cc899f7bc4 |
|
MD5 | 5dc14865caf15ae6eafd7615715c3903 |
|
BLAKE2b-256 | debefc9e0dc8e66b989592b5f2dbaa4a313b04d744efb341e2573f721fa4f576 |
File details
Details for the file arcae-0.2.4-cp312-cp312-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efc556ff6859ce87cbcde27340f84f917670cf44ab8ea8a9c643ca3834216f83 |
|
MD5 | fac17d92e8330b615eac804c51ed06eb |
|
BLAKE2b-256 | 5b2efdc7295bb723fc5250c266458b0a909199376c9f9b9ab647e154c1cf3e20 |
File details
Details for the file arcae-0.2.4-cp312-cp312-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp312-cp312-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 29.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6f431a0f822b7ee6755bdab27b75bf09de7a0566a341cd4adf893406efa18a6 |
|
MD5 | 9f1007c94019ff512bdf1a8273181768 |
|
BLAKE2b-256 | e5d054b0d2e8c431f84ee51094997ff836ff9208608c962e2f9369c8453a6878 |
File details
Details for the file arcae-0.2.4-cp312-cp312-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp312-cp312-macosx_11_0_x86_64.whl
- Upload date:
- Size: 15.6 MB
- Tags: CPython 3.12, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26d10b437398451b68c9f5b9ce8ff0d0529c1f11deddfca70e8152fed0c39f8f |
|
MD5 | f5ebccb79576923bff7ac8ba5579669b |
|
BLAKE2b-256 | b7fe1729e813ab827b183b91b13130df69fd8f7a2737d11f817e71832db8dc1c |
File details
Details for the file arcae-0.2.4-cp312-cp312-macosx_11_0_arm64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 13.3 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d6dacbaf7da01188188244dcc4d1a0740a17829a73e01672d1428854fe7452b |
|
MD5 | 11df5bd43c0721aad19180f93135eedd |
|
BLAKE2b-256 | 18e6e9d73bc15e08ec302e2cf6d70290d60d79229ca82ddfe54eb102658942b3 |
File details
Details for the file arcae-0.2.4-cp311-cp311-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8c626d963203a1114fc7b5c835a4e45b79ce50c871a116a435b297f58bdc32a |
|
MD5 | ee74154c6cdbd366fc2dd8fcda86228c |
|
BLAKE2b-256 | bacc9d8bf809baef17fb0f4174533d8bf05fc96943b3a4f745a8f1b93d20c7ce |
File details
Details for the file arcae-0.2.4-cp311-cp311-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 29.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 334af4ced87ebf1601c05e5b4f6896a4ee6ab09c28f3928b963eab83c12b26a8 |
|
MD5 | 5afdfc03c5c6d07e9bb9559d843ccbe3 |
|
BLAKE2b-256 | 7375d9c3deb34e6cf1c91f7ffb6ee9a7413a7f53a93b25d029e367d928bf577c |
File details
Details for the file arcae-0.2.4-cp311-cp311-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp311-cp311-macosx_11_0_x86_64.whl
- Upload date:
- Size: 15.6 MB
- Tags: CPython 3.11, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68cf578e3d2a0dcb471e4727bbe5ea8f71ba018782e5771cc3ad7a48c07a3e39 |
|
MD5 | 29a12a5e700d910bfa47e416977e6b76 |
|
BLAKE2b-256 | addb33d12787d189b23c8a6fed71010926ee0e987b78ca47d918d2064c7e869c |
File details
Details for the file arcae-0.2.4-cp311-cp311-macosx_11_0_arm64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 13.3 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7a28489d8889de9a895874eec65b4f50ddf1c5a3952e8e80b618bfa8e868bc1 |
|
MD5 | ca99a7381886f82ff321d8878075507d |
|
BLAKE2b-256 | 00a33277d43c0d3670746c9f69c53c549077c0e98778c6c23b053e06ebcd7c26 |
File details
Details for the file arcae-0.2.4-cp310-cp310-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70ad7251e507e51a7baad9030739c4498eda58cad51aa5c1647d64264f865536 |
|
MD5 | 2b849144baf599e7ccd4e9b04b221004 |
|
BLAKE2b-256 | 6602e0e5a69d25b49a5b1d0299c07d81e9c63eae35633e2a10c8c3fc51877a85 |
File details
Details for the file arcae-0.2.4-cp310-cp310-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp310-cp310-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 29.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4203d894c97c78cc388d5ade416514137cd8a4b09e30670e8bb5b571ddcd7f8 |
|
MD5 | a01f4b16a086cf53c8b356781b200d9d |
|
BLAKE2b-256 | 17bbfbe6e52ce82d137b16eef44cb2521e8c4aa390e51bed41af31140b2a0b5f |
File details
Details for the file arcae-0.2.4-cp310-cp310-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp310-cp310-macosx_11_0_x86_64.whl
- Upload date:
- Size: 15.6 MB
- Tags: CPython 3.10, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bb26638b6d32449cf02575a9ebdb649cf1ef3743dea761d0af71b3b2930d870 |
|
MD5 | fb8145bca2c95836af78128946a3cb90 |
|
BLAKE2b-256 | a130e8901e5d3767dfe28a9324930b8ebe89850129a441742213cc634c07f90b |
File details
Details for the file arcae-0.2.4-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 13.3 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4aec7ccd495ff8be670e92208f5dad11fe8e7a0a2ccf359afa2a4494397fbfe9 |
|
MD5 | 929e63ca8a9375f08c40aca8b5c7ff1a |
|
BLAKE2b-256 | 8d347235df58de8f9dae3e130cbac019bbc2c29857dd1876991f7c2e43f7a29e |
File details
Details for the file arcae-0.2.4-cp39-cp39-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.3 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48a2a7a716e27352c5bdcfe36402c2947a8ca8c42f6fc01cbcfc343c217df93e |
|
MD5 | d53f069841ca58b89e73a5cee1e6479a |
|
BLAKE2b-256 | bbc19f5a33cf52265799ab63451bf931df363870b9863c034b68c5cfe45f91ac |
File details
Details for the file arcae-0.2.4-cp39-cp39-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp39-cp39-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 29.4 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1770b4ca1fa852cbaf8cb8a8ffb957f30edbb649a0724ba244402b82c239cfd |
|
MD5 | 469b8e144725ebcaf92d4874c215b6c3 |
|
BLAKE2b-256 | c9e40693806105d6494e8f01cbf618ee5d020e5036d63ce81bf27926fac214d4 |
File details
Details for the file arcae-0.2.4-cp39-cp39-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp39-cp39-macosx_11_0_x86_64.whl
- Upload date:
- Size: 15.6 MB
- Tags: CPython 3.9, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53d157ade3c6bc60756b8e20734c7c193709dead305d7860e083822abbd7c8b4 |
|
MD5 | 798f84df59e423920c4ba318a7b1d110 |
|
BLAKE2b-256 | bdff6a195dd0590968b4b5432a4f55d1f8ba9fbe32a7ea1411cbbe5d46f08a14 |
File details
Details for the file arcae-0.2.4-cp39-cp39-macosx_11_0_arm64.whl
.
File metadata
- Download URL: arcae-0.2.4-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 13.3 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5615b96215079a08a7e47563d136f2afaaaf59cd72a2a1fc8d65c7f12763e750 |
|
MD5 | 703cbce8d9e3e843a2fa9c73bb96a725 |
|
BLAKE2b-256 | c1f67f5e0d3b3e64534876f14d250addfd75016112d9e03975d45ab3c93772f2 |