Skip to main content

xarray MSv4 views over MSv2 Measurement Sets

Project description

https://img.shields.io/pypi/v/xarray-ms.svg https://github.com/ratt-ru/xarray-ms/actions/workflows/ci.yml/badge.svg Documentation Status

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

>>> import xarray_ms
>>> from xarray.backends.api import datatree
>>> dt = open_datatree("/data/L795830_SB001_uv.MS/",
                       chunks={"time": 2000, "baseline": 1000})
>>> dt
<xarray.DataTree>
Group: /
└── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0
       Dimensions:                     (time: 28760, baseline: 2775, frequency: 16,
                                        polarization: 4, uvw_label: 3)
       Coordinates:
           antenna1_name               (baseline) object 22kB ...
           antenna2_name               (baseline) object 22kB ...
           baseline_id                 (baseline) int64 22kB ...
         * frequency                   (frequency) float64 128B 1.202e+08 ... 1.204e+08
         * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
         * time                        (time) float64 230kB 1.601e+09 ... 1.601e+09
       Dimensions without coordinates: baseline, uvw_label
       Data variables:
           EFFECTIVE_INTEGRATION_TIME  (time, baseline) float64 638MB ...
           FLAG                        (time, baseline, frequency, polarization) uint8 5GB ...
           TIME_CENTROID               (time, baseline) float64 638MB ...
           UVW                         (time, baseline, uvw_label) float64 2GB ...
           VISIBILITY                  (time, baseline, frequency, polarization) complex64 41GB ...
           WEIGHT                      (time, baseline, frequency, polarization) float32 20GB ...
       Attributes:
           version:              0.0.1
           creation_date:        2024-09-18T10:49:55.133908+00:00
           data_description_id:  0
    └── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0/ANTENNA
            Dimensions:                 (antenna_name: 74,
                                         cartesian_pos_label/ellipsoid_pos_label: 3)
            Coordinates:
                baseline_antenna1_name  (baseline) object 22kB ...
                baseline_antenna2_name  (baseline) object 22kB ...
                baseline_id             (baseline) int64 22kB ...
              * frequency               (frequency) float64 128B 1.202e+08 1.202e+08 ... 1.204e+08
              * polarization            (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
              * time                    (time) float64 230kB 1.601e+09 1.601e+09 ... 1.601e+09
              * antenna_name            (antenna_name) object 592B 'CS001HBA0' ... 'IE613HBA'
                mount                   (antenna_name) object 592B 'X-Y' 'X-Y' ... 'X-Y' 'X-Y'
                station                 (antenna_name) object 592B 'LOFAR' 'LOFAR' ... 'LOFAR'
            Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
            Data variables:
                ANTENNA_POSITION        (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 2kB ...

Measurement Set v4

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

Work in Progress

The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.

However, the most important parts of the MSv2 MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray_ms-0.2.1.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

xarray_ms-0.2.1-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file xarray_ms-0.2.1.tar.gz.

File metadata

  • Download URL: xarray_ms-0.2.1.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for xarray_ms-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c828803ead83c2f6595b448bdc70cab30e20d28ba9e074096d79791c36d72833
MD5 c37285647718a456c9810c1924d802b2
BLAKE2b-256 f61b051680a89cf5d89df36cca122505b1e0db3d247365c9bf3fc5d42dc6bc13

See more details on using hashes here.

File details

Details for the file xarray_ms-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: xarray_ms-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for xarray_ms-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 818f241d02c98305552de34f738374f9032e4c1a136343a9e6a9ffe5c483632d
MD5 2ae79a50bcdcaa11c9e6c1bd88213eae
BLAKE2b-256 ec024de1814dd9c373d09e6b9f06772fb12a90cb842ec819937896fabd40c506

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page