A fast interface between SQLAlchemy and Numpy
Project description
A fast interface between SQLAlchemy and Numpy
Features
This project reads data from SQLAlchemy into structured numpy arrays. Next to the builtin SQLAlchemy types, it also supports reading geometries into pygeos arrays of geometries.
Installation
Install via pip using:
$ pip install condenser
Include geometry support:
$ pip install condenser[geo]
Usage
condenser exposes one object: NumpyQuery, which is used as a custom SQLAlchemy query class as follows:
>>> from condenser import NumpyQuery >>> # create the session with a custom query class >>> session = session_factory(query_cls=NumpyQuery)
If a session is constructed like that, every query object will have an additional method to dump the selected data into a numpy structured array:
>>> query = session.query(SomeModel.float_type_column, SomeModel.int_type_column) >>> my_array = query.as_structarray()
Geometry support
Geometry columns are automatically converted to arrays of pygeos.Geometry objects. See https://pygeos.readthedocs.io on for (vectorized) numpy functions that can act on these arrays.
Transform geometries (using the ST_Transform database function) as follows:
>>> query.with_transformed_geometries(target_srid=28992)
Note that this will only transform geometries with a known SRID. If an SRID is known only from another metadata source, use a function appropriate to your database backend to set the projection before converting it. Another option is using the pyproj library in combination with pygeos.apply to transform geometries from Python.
Custom dtype mapping
condenser has a safe approach on guessing the Numpy dtypes from SQLAlchemy dtypes. It always takes 8-byte signed integers and floats. For some database backends this can be changed to for example 4-byte datatypes. Override an SQLAlchemy to NumPy type mapping after constructing the query:
>>> from sqlalchemy import Integer >>> query = session.query(SomeModel.float_type_column) >>> query.numpy_settings[Integer]["dtype"] = np.int32 >>> query.as_structarray()
Or globally:
>>> NumpyQuery.default_numpy_settings[Integer]["dtype"] = np.int32
NULL values
Most numpy datatypes handle NULL (Python: None) values natively. Only integer typed columns deserve extra attention as they have no equivalent in the NumPy integer dtype. This package converts NULLs in integer columns to -1 by default. A complete list of what to expect:
NULL in integer columns becomes -1
NULL in float and numeric columns becomes nan
NULL in boolean columns becomes False
NULL in object typed columns (string, text, geometry) becomes None
Adjust the NULL value before executing the query:
>>> from sqlalchemy import Integer >>> query = session.query(SomeModel.float_type_column) >>> query.numpy_settings[Integer]["null"] = -9999 >>> query.numpy_settings[Boolean]["null"] = True >>> query.as_structarray()
Or globally:
>>> NumpyQuery.default_numpy_settings[Integer]["null"] = -9999
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for condenser-0.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fafeea9a6d589db78b8cc3e6dd65a126d250b35afeed744fddf27ee857bc60b8 |
|
MD5 | e255278ecfdb862ced13bc5b21308323 |
|
BLAKE2b-256 | ed6c2380101f86d77bd64b08f944dbce565c9d2457c6f8378a7061ef8158b0ea |