SQLAlchemy extensions for HEALPix spatially indexed astronomy data
Project description
To acknowledge HEALPix Alchemy in a scholarly article, please cite the following two references.
HEALPix Alchemy paper
Singer, L. P., Parazin, B., Coughlin, M. W., et al. (2022). "HEALPix Alchemy: Fast All-Sky Geometry and Image Arithmetic in a Relational Database for Multimessenger Astronomy Brokers." Astronomical Journal 163 209. https://doi.org/10.3847/1538-3881/ac5ab8
Code archive on Zenodo
Singer, L. P., Parazin, B., Coughlin, M. W., et al. (2022). "skyportal/healpix-alchemy: Version 1.0.1." https://doi.org/10.5281/zenodo.5768564
HEALPix Alchemy
The healpix_alchemy
Python package is an extension for the SQLAlchemy
object relational mapper. It adds region and image arithmetic to PostgreSQL
(version 14 and newer) databases. It accelerates queries between point clouds,
regions, and images (sometimes known in the geospatial community as rasters) by
storing multi-order HEALPix indices in PostgreSQL's range types.
The healpix_alchemy
project is designed for astronomy applications,
particularly for cross-matching galaxy catalogs, observation footprints, and
all-sky images like gravitational-wave probability sky maps or even dust
maps. However, it could be used in any context in which geometry is embedded on
a unit sphere.
healpix_alchemy
is lean and minimalist because it leverages several existing
projects: it consists of little more than a few lines of glue code to bind
together MOCPy, SQLAlchemy, and PostgreSQL's range types.
healpix_alchemy
serves a purpose similar to full-featured astronomy-focused
spatial extensions like Q3C, H3C, and pg_healpix, and geospatial
extensions like PgSphere and PostGIS. What sets healpix_alchemy
apart
from these is that it is written in pure Python and requires no server-side
database extensions. Consequently, healpix_alchemy
can be used with managed
PostgreSQL databases in the cloud like Amazon RDS and Google Cloud SQL.
Theory of operation
HEALPix basics
HEALPix is a scheme for subdividing and indexing the unit sphere, originally described by Górski et al. (2005). Although it was originally designed for cosmic microwave background analysis, it has found many uses in astronomy, particularly through multi-order coverage (MOC) maps and hierarchical progressive surveys (HiPS) used extensively in the Aladin astronomical information system. It is also used by LIGO and Virgo to store and communicate gravitational-wave probability sky maps.
HEALPix can be thought of as a tree. At the lowest resolution, level 0, HEALPix subdivides the sphere into 12 equal-area base tiles, assigned integer indices 0 through 11. At level 1, each of the 12 base tiles is subdivided into 4 tiles. Every subsequent level divides each of the preceding level's tiles into 4 new tiles. At a given level, each of the base pixels has been divided into 4level pixels (nside = 2level pixels on each side). Thus there are npix = 12×4level pixels at a given resolution, assigned integer indices from 0 through (npix-1). This is called the NESTED indexing scheme. (There is also a RING indexing scheme in which the indices advance from east to west and then from north to south.)
A HEALPix tile, a node in the HEALPix tree, is fully addressed by three pieces of information: the indexing scheme (RING or NESTED), the resolution level (level or equivalently nside), and the pixel index (ipix, an integer between 0 and npix-1).
The image below, reproduced from https://healpix.jpl.nasa.gov, illustrates the first 4 levels of refinement of a HEALPix grid.
HEALPix interval sets
A region on the sphere can be encoded by a collection of disjoint HEALPix tiles, potentially at a mix of different resolution levels. Typically, large low-resolution tiles are used on the interior of the region, and small high-resolution tiles are used on the boundary. This is called a multi-order coverage (MOC) map. An example, reproduced from the MOCPy documentation, is shown below.
Much like MOCs, LIGO/Virgo/KAGRA gravitational-wave probability sky maps are stored as multi-resolution HEALPix data sets, but with a vector of floating-point values attached to each tile. A multi-order refinement mesh from an example sky map is shown below, reproduced from Singer & Price (2015).
Reinecke & Hivon (2015) introduced HEALPix interval sets as an alternative encoding of MOCs that enables fast and simple unions, intersections, and queries. In an interval set, each HEALPix tile is described by the interval of pixel indices at some very high resolution levelmax that are descendents of that tile. In an interval set, a region is encoded as a disjoint collection of such intervals. A tile with a NESTED address given by (level, ipix) may be described as the half-open interval
[ipix 4levelmax - level, (ipix + 1) 4levelmax - level).
We use levelmax = 29 because this is the highest resolution at which pixel indices can be stored in a signed 64-bit integer. At this resolution, each pixel is scarcely 0.4 milliarcseconds across.
The interval set representation is adventageous because there are simple and fast algorithms for interval arithmetic and set operations. Interval analysis appears in a suprising variety of scientific contexts from genomics to gravitational wave data quality. Because of the many business applications of interval arithmetic, intervals are also supported in the PostgreSQL database through its range types.
Spatial primitives in healpix_alchemy
The healpix_alchemy
package provides two custom column types for
SQLAlchemy:
healpix_alchemy.Point
This class represents a point. A column of this type could store the positions
of galaxies in a catalog. Under the hood, it is just a BIGINT
.
Wherever you need to bind a Python value to a healpix_alchemy.Point
, you may
provide any one of the following:
- an instance of astropy.coordinates.SkyCoord
- a sequence of two astropy.units.Quantity instances with angle units, which will be interpreted as the right ascension and declination of the point in the ICRS frame
- an integer representing the HEALPix NESTED index of the point at level = levelmax
healpix_alchemy.Tile
This class represents a HEALPix tile. A table containing a column of this type
and a foreign key could store MOCs or gravitational-wave probability maps.
Under the hood, it is just an INT8RANGE
.
Wherever you need to bind a Python value to a healpix_alchemy.Tile
, you may
provide any one of the following:
- A single integer which will be interpreted as the address of the tile in the UNIQ HEALPix indexing scheme
- A sequence of two integers, which will be interpreted as the lower and upper bounds of the right-half-open pixel index interval at level = levelmax
- A string like
'[1234,5678)'
Installation
You can install healpix_alchemy
from the Python Package Index using pip:
$ pip install healpix-alchemy
Development Installation
Contributions are welcome! This package uses the Poetry packaging and
dependency tool and pytest for unit tests. To install healpix_alchemy
in a
development environment, follow these instructions.
-
Install Poetry by following the official Poetry installation instructions.
-
Clone this repository:
$ git clone https://github.com/skyportal/healpix-alchemy.git $ cd healpix-alchemy
-
Initialize the Poetry-managed virtual environment with
healpix_alchemy
and all of its dependencies installed by running this command:$ poetry install
Now, you can enter a shell inside the virtual environment by running:
$ poetry shell
-
To run the test suite, including the examples in this README file, run this command inside the Poetry shell:
$ pytest
Example
First, some imports:
>>> from sqlalchemy import orm
>>> import sqlalchemy as sa
>>> import healpix_alchemy as ha
Set up tables
This example will use the SQLAlchemy declarative extension for describing table schema using Python classes.
SQLAlchemy needs to know the name for each table. You can provide the name by
setting the __tablename__
attribute in each model class, or you can
create a base class that generates the table name automatically from the class
name.
>>> @orm.as_declarative()
... class Base:
...
... @orm.declared_attr
... def __tablename__(cls):
... return cls.__name__.lower()
Each row of the Galaxy
table represents a point in a catalog:
>>> class Galaxy(Base):
... id = sa.Column(sa.Text, primary_key=True)
... hpx = sa.Column(ha.Point, index=True, nullable=False)
Each row of the Field
table represents a ZTF field:
>>> class Field(Base):
... id = sa.Column(sa.Integer, primary_key=True)
... tiles = orm.relationship(lambda: FieldTile)
Each row of the FieldTile
table represents a multi-resolution HEALPix tile
that is contained within the corresponding field. There is a one-to-many
mapping between Field
and FieldTile
.
>>> class FieldTile(Base):
... id = sa.Column(sa.ForeignKey(Field.id), primary_key=True)
... hpx = sa.Column(ha.Tile, primary_key=True, index=True)
Each row of the Skymap
table represents a LIGO/Virgo HEALPix
localization map.
>>> class Skymap(Base):
... id = sa.Column(sa.Integer, primary_key=True)
... tiles = orm.relationship(lambda: SkymapTile)
Each row of the SkymapTile
table represents a multi-resolution HEALPix
tile within a LIGO/Virgo localization map. There is a one-to-many mapping
between Skymap
and SkymapTile
.
>>> class SkymapTile(Base):
... id = sa.Column(sa.ForeignKey(Skymap.id), primary_key=True)
... hpx = sa.Column(ha.Tile, primary_key=True, index=True)
... probdensity = sa.Column(sa.Float, nullable=False)
Finally, connect to the database, create all the tables, and start a session.
>>> engine = sa.create_engine('postgresql://user:password@host/database')
>>> Base.metadata.create_all(engine)
>>> session = orm.Session(engine)
Populate with sample data
Load the 2MASS Redshift Survey into the Galaxy
table. This catalog contains
44599 galaxies.
It may take up to a minute for this to finish. Advanced users may speed this up
significantly by vectorizing the conversion from SkyCoord
to HEALPix indices
and using SQLAlchemy bulk insertion.
>>> from astropy.coordinates import SkyCoord
>>> from astroquery.vizier import Vizier
>>> vizier = Vizier(columns=['SimbadName', 'RAJ2000', 'DEJ2000'], row_limit=-1)
>>> data, = vizier.get_catalogs('J/ApJS/199/26/table3')
>>> data['coord'] = SkyCoord(data['RAJ2000'], data['DEJ2000'])
>>> for row in data:
... session.add(Galaxy(id=row['SimbadName'], hpx=row['coord']))
>>> session.commit()
Load the footprints of the Zwicky Transient Facility fields into the Field
and FieldTile
tables.
It may take up to a minute for this to finish. Advanced users may speed this up significantly by using SQLAlchemy bulk insertion.
>>> from astropy.table import Table
>>> from astropy.coordinates import SkyCoord
>>> from astropy import units as u
>>> url = 'https://raw.githubusercontent.com/ZwickyTransientFacility/ztf_information/9fd0ba8842709f42a134c88827309ccab728fcb7/field_grid/ztf_field_corners.csv'
>>> for row in Table.read(url):
... field_id = int(row['field'])
... corners = SkyCoord(row['ra1', 'ra2', 'ra3', 'ra4'],
... row['dec1', 'dec2', 'dec3', 'dec4'],
... unit=u.deg)
... tiles = [FieldTile(hpx=hpx) for hpx in ha.Tile.tiles_from(corners)]
... session.add(Field(id=field_id, tiles=tiles))
>>> session.commit()
Load a sky map for LIGO/Virgo event GW200115_042309 (S200115j) into the
Skymap
and SkymapTile
tables.
>>> url = 'https://gracedb.ligo.org/apiweb/superevents/S200115j/files/bayestar.multiorder.fits'
>>> data = Table.read(url)
>>> tiles = [SkymapTile(hpx=row['UNIQ'], probdensity=row['PROBDENSITY']) for row in data]
>>> session.add(Skymap(id=1, tiles=tiles))
>>> session.commit()
Last, run ANALYZE
to prepare the data for use:
>>> session.execute(sa.text('ANALYZE'))
<sqlalchemy.engine.cursor.CursorResult object at 0x...>
Sample Queries
What is the area of each field?
>>> query = sa.select(
... FieldTile.id, sa.func.sum(FieldTile.hpx.area)
... ).group_by(
... FieldTile.id
... ).limit(
... 5
... )
>>> for id, area in session.execute(query):
... print(f'Field {id} has area {area:.3g} sr')
Field 199 has area 0.0174 sr
Field 200 has area 0.0174 sr
Field 201 has area 0.0174 sr
Field 202 has area 0.0174 sr
Field 203 has area 0.0174 sr
How many galaxies are in each field?
>>> count = sa.func.count(Galaxy.id)
>>> query = sa.select(
... FieldTile.id, count
... ).filter(
... FieldTile.hpx.contains(Galaxy.hpx)
... ).group_by(
... FieldTile.id
... ).order_by(
... count.desc()
... ).limit(
... 5
... )
>>> for id, n in session.execute(query):
... print(f'Field {id} contains {n} galaxies')
Field 1739 contains 343 galaxies
Field 699 contains 336 galaxies
Field 700 contains 311 galaxies
Field 225 contains 303 galaxies
Field 1740 contains 289 galaxies
What is the probability density at the position of each galaxy?
>>> query = sa.select(
... Galaxy.id, SkymapTile.probdensity
... ).filter(
... SkymapTile.id == 1,
... SkymapTile.hpx.contains(Galaxy.hpx)
... ).order_by(
... SkymapTile.probdensity.desc()
... ).limit(
... 5
... )
>>> for id, p in session.execute(query):
... print(f'{id} has prob. density {p:.5g}/sr')
2MASX J02532153+0632222 has prob. density 20.701/sr
2MASX J02530482+0555431 has prob. density 20.695/sr
2MASX J02533119+0628252 has prob. density 20.669/sr
2MASX J02524584+0639206 has prob. density 20.656/sr
2MASX J02534120+0615562 has prob. density 20.567/sr
What is the probability contained within each field?
>>> area = (FieldTile.hpx * SkymapTile.hpx).area
>>> prob = sa.func.sum(SkymapTile.probdensity * area)
>>> query = sa.select(
... FieldTile.id, prob
... ).filter(
... SkymapTile.id == 1,
... FieldTile.hpx.overlaps(SkymapTile.hpx)
... ).group_by(
... FieldTile.id
... ).order_by(
... prob.desc()
... ).limit(
... 5
... )
>>> for id, prob in session.execute(query):
... print(f'Field {id} probability is {prob:.3g}')
Field 1499 probability is 0.165
Field 1446 probability is 0.156
Field 452 probability is 0.154
Field 505 probability is 0.0991
Field 401 probability is 0.0962
What is the combined area of fields 1000 through 2000?
In the next two examples, we introduce healpix_alchemy.func.union()
which
finds the union of a set of tiles. Because it is an aggregate function, it
should generally be used in a subquery.
>>> union = sa.select(
... ha.func.union(FieldTile.hpx).label('hpx')
... ).filter(
... FieldTile.id.between(1000, 2000)
... ).subquery()
>>> query = sa.select(
... sa.func.sum(union.columns.hpx.area)
... )
>>> result = session.execute(query).scalar_one()
>>> print(f'{result:.3g} sr')
9.33 sr
What is the integrated probability contained within fields 1000 through 2000?
>>> union = sa.select(
... ha.func.union(FieldTile.hpx).label('hpx')
... ).filter(
... FieldTile.id.between(1000, 2000)
... ).subquery()
>>> prob = sa.func.sum(SkymapTile.probdensity * (union.columns.hpx * SkymapTile.hpx).area)
>>> query = sa.select(
... prob
... ).filter(
... SkymapTile.id == 1,
... union.columns.hpx.overlaps(SkymapTile.hpx)
... )
>>> result = session.execute(query).scalar_one()
>>> print(f'{result:.3g}')
0.837
What is the area of the 90% credible region?
>>> cum_area = sa.func.sum(
... SkymapTile.hpx.area
... ).over(
... order_by=SkymapTile.probdensity.desc()
... ).label(
... 'cum_area'
... )
>>> cum_prob = sa.func.sum(
... SkymapTile.probdensity * SkymapTile.hpx.area
... ).over(
... order_by=SkymapTile.probdensity.desc()
... ).label(
... 'cum_prob'
... )
>>> subquery = sa.select(
... cum_area,
... cum_prob
... ).filter(
... SkymapTile.id == 1
... ).subquery()
>>> query = sa.select(
... sa.func.max(subquery.columns.cum_area)
... ).filter(
... subquery.columns.cum_prob <= 0.9
... )
>>> result = session.execute(query).scalar_one()
>>> print(f'{result:.3g} sr')
0.277 sr
Which galaxies are within the 90% credible region?
>>> cum_prob = sa.func.sum(
... SkymapTile.probdensity * SkymapTile.hpx.area
... ).over(
... order_by=SkymapTile.probdensity.desc()
... ).label(
... 'cum_prob'
... )
>>> subquery = sa.select(
... SkymapTile.probdensity,
... cum_prob
... ).filter(
... SkymapTile.id == 1
... ).subquery()
>>> min_probdensity = sa.select(
... sa.func.min(subquery.columns.probdensity)
... ).filter(
... subquery.columns.cum_prob <= 0.9
... ).scalar_subquery()
>>> query = sa.select(
... Galaxy.id
... ).filter(
... SkymapTile.id == 1,
... SkymapTile.hpx.contains(Galaxy.hpx),
... SkymapTile.probdensity >= min_probdensity
... ).limit(
... 5
... )
>>> for galaxy_id, in session.execute(query):
... print(galaxy_id)
2MASX J02424077-0000478
2MASX J02352772-0921216
2MASX J02273746-0109226
2MASX J02414523+0026354
2MASX J20095408-4822462
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for healpix_alchemy-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6e776930b33791612a59d65067a6d85d0d39662842c62654d765503fc377c7e |
|
MD5 | 7f718c0550172a98c61539808ade0e18 |
|
BLAKE2b-256 | f1aa603eca7e3647e918d694158f9512e2d5f40e8a7f74423c33924750087dc8 |