Skip to main content

xarray MSv4 views over MSv2 Measurement Sets

Project description

https://img.shields.io/pypi/v/xarray-ms.svg https://github.com/ratt-ru/xarray-ms/actions/workflows/ci.yml/badge.svg Documentation Status

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

>>> import xarray_ms
>>> import xarray
>>> ds = xarray.open_dataset("/data/L795830_SB001_uv.MS/",
                             chunks={"time": 2000, "baseline": 1000})
>>> ds
  <xarray.Dataset> Size: 70GB
  Dimensions:                     (time: 28760, baseline: 2775, frequency: 16,
                                  polarization: 4, uvw_label: 3)
  Coordinates:
      antenna1_name               (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
      antenna2_name               (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
      baseline_id                 (baseline) int64 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
    * frequency                   (frequency) float64 128B 1.202e+08 ... 1.204e+08
    * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
    * time                        (time) float64 230kB 1.601e+09 ... 1.601e+09
  Dimensions without coordinates: baseline, uvw_label
  Data variables:
      EFFECTIVE_INTEGRATION_TIME  (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
      FLAG                        (time, baseline, frequency, polarization) uint8 5GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
      TIME_CENTROID               (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
      UVW                         (time, baseline, uvw_label) float64 2GB dask.array<chunksize=(2000, 1000, 3), meta=np.ndarray>
      VISIBILITY                  (time, baseline, frequency, polarization) complex64 41GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
      WEIGHT                      (time, baseline, frequency, polarization) float32 20GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
  Attributes:
      antenna_xds:          <xarray.Dataset> Size: 4kB\nDimensions:           (...
      version:              0.0.1
      creation_date:        2024-09-10T14:29:22.587984+00:00
      data_description_id:  0

Measurement Set v4

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this was not guaranteed, especially after MSv2 datasets had been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

Work in Progress

The Measurement Set v4 specification is currently under active development. xarray-ms is currently under active development and does not yet have feature parity with xradio.

Most measures information and many secondary sub-tables are currently missing. However, the most important parts of the MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray_ms-0.2.0.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

xarray_ms-0.2.0-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file xarray_ms-0.2.0.tar.gz.

File metadata

  • Download URL: xarray_ms-0.2.0.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for xarray_ms-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7ca09d2d901315fb3927004d0093b1b27efecf7f1cc93fe60116c9c4ea698b11
MD5 d8faba39f88eb8f41da2b13d577a4302
BLAKE2b-256 18484480b9f1c8820f9b1d398bb94d100559704927a15305e5acde0f1f0942fb

See more details on using hashes here.

File details

Details for the file xarray_ms-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: xarray_ms-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for xarray_ms-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83470ae09953687d55d67a6e1c0f7a57353c60196de2ad8c96dbc484f18a4dc1
MD5 dbca43719eef8b44de9a78023d9d0057
BLAKE2b-256 ae931bf6f6f7887d2f699e7fdaaeb3adf9be86c1981aa1129ea4e30e04f5254c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page