A light-weight analytical engine for OLAP processing
Project description
# Babbage Analytical Engine
[![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
[![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/spendb/babbage)
[![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)
``babbage`` is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.
It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
has less ambitious goals, i.e. no pre-computation of aggregates, or
multiple storage backends.
``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).
## Installation and test
``babbage`` will normally included as a PyPI dependency, or installed via
``pip``:
```bash
$ pip install babbage
```
People interested in contributing to the package should instead check out the
source repository and then use the provided ``Makefile`` to install the
library (this requires ``virtualenv`` to be installed):
```bash
$ git clone https://github.com/spendb/babbage.git
$ cd babbage
$ make install
$ make test
```
## Usage
``babbage`` is used to query a set of existing database tables, using an
abstract, logical model to query them. A sample of a logical model can be
found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
the model is available in ``babbage/schema/model.json``.
The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a ``CubeManager``, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.
Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.
### Programmatic usage
Let's assume you have an existing database table of procurement data and
want to query it using ``babbage`` in a Python shell. A session might look
like this:
```python
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure
engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))
cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)
# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201
# There's a field called 'total_value':
assert 'total_value' in facts['fields']
# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'
# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0
# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']
# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
drilldowns='supplier|authority'
cut='year:2015|authority.country:GB',
page_size=500)
# This translates to:
# Aggregate the procurement data by summing up the 'total_value'
# for each unique pair of values in the 'supplier' and 'authority'
# dimensions, and filter for only those entries where the 'year'
# dimensions key attribute is '2015' and the 'authority' dimensions
# 'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0
# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
```
### Using the HTTP API
The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a ``CubeManager`` and then
configure the API like this:
```python
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api
app = Flask('demo')
engine =
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')
app.run()
```
Of course, you can define your own ``CubeManager``, for example if
you wish to retrieve model metadata from a database.
When enabled, the API will expose a number of JSON(P) endpoints
relative to the given ``url_prefix``:
* ``/``, returns the system status and version.
* ``/cubes``, returns a list of the available cubes (name only).
* ``/cubes/<name>/model``, returns full metadata for a given
cube (i.e. measures, dimensions, aggregates etc.)
* ``/cubes/<name>/facts`` is used to return individual entries from
the cube in a non-aggregated form. Supports filters (``cut``), a
set of ``fields`` to return and a ``sort`` (``field_name:direction``),
as well as ``page`` and ``page_size``.
* ``/cubes/<name>/members`` is used to return the distinct set of
values for a given dimension, e.g. all the suppliers mentioned in
a procurement dataset. Supports filters (``cut``), a and a ``sort``
(``field_name:direction``), as well as ``page`` and ``page_size``.
* ``/cubes/<name>/aggregate`` is the main endpoint for generating
aggregate views of the data. Supports specifying the ``aggregates``
to include, the ``drilldowns`` to aggregate by, a set of filters
(``cut``), a and a ``sort`` (``field_name:direction``), as well
as ``page`` and ``page_size``.
[![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
[![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/spendb/babbage)
[![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)
``babbage`` is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.
It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
has less ambitious goals, i.e. no pre-computation of aggregates, or
multiple storage backends.
``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).
## Installation and test
``babbage`` will normally included as a PyPI dependency, or installed via
``pip``:
```bash
$ pip install babbage
```
People interested in contributing to the package should instead check out the
source repository and then use the provided ``Makefile`` to install the
library (this requires ``virtualenv`` to be installed):
```bash
$ git clone https://github.com/spendb/babbage.git
$ cd babbage
$ make install
$ make test
```
## Usage
``babbage`` is used to query a set of existing database tables, using an
abstract, logical model to query them. A sample of a logical model can be
found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
the model is available in ``babbage/schema/model.json``.
The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a ``CubeManager``, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.
Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.
### Programmatic usage
Let's assume you have an existing database table of procurement data and
want to query it using ``babbage`` in a Python shell. A session might look
like this:
```python
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure
engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))
cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)
# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201
# There's a field called 'total_value':
assert 'total_value' in facts['fields']
# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'
# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0
# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']
# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
drilldowns='supplier|authority'
cut='year:2015|authority.country:GB',
page_size=500)
# This translates to:
# Aggregate the procurement data by summing up the 'total_value'
# for each unique pair of values in the 'supplier' and 'authority'
# dimensions, and filter for only those entries where the 'year'
# dimensions key attribute is '2015' and the 'authority' dimensions
# 'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0
# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
```
### Using the HTTP API
The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a ``CubeManager`` and then
configure the API like this:
```python
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api
app = Flask('demo')
engine =
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')
app.run()
```
Of course, you can define your own ``CubeManager``, for example if
you wish to retrieve model metadata from a database.
When enabled, the API will expose a number of JSON(P) endpoints
relative to the given ``url_prefix``:
* ``/``, returns the system status and version.
* ``/cubes``, returns a list of the available cubes (name only).
* ``/cubes/<name>/model``, returns full metadata for a given
cube (i.e. measures, dimensions, aggregates etc.)
* ``/cubes/<name>/facts`` is used to return individual entries from
the cube in a non-aggregated form. Supports filters (``cut``), a
set of ``fields`` to return and a ``sort`` (``field_name:direction``),
as well as ``page`` and ``page_size``.
* ``/cubes/<name>/members`` is used to return the distinct set of
values for a given dimension, e.g. all the suppliers mentioned in
a procurement dataset. Supports filters (``cut``), a and a ``sort``
(``field_name:direction``), as well as ``page`` and ``page_size``.
* ``/cubes/<name>/aggregate`` is the main endpoint for generating
aggregate views of the data. Supports specifying the ``aggregates``
to include, the ``drilldowns`` to aggregate by, a set of filters
(``cut``), a and a ``sort`` (``field_name:direction``), as well
as ``page`` and ``page_size``.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
babbage-0.2.0.tar.gz
(16.5 kB
view details)
File details
Details for the file babbage-0.2.0.tar.gz
.
File metadata
- Download URL: babbage-0.2.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ce49a417dce3c629c64870c048831b856ac8319586ff8ed053dbae6c302b651 |
|
MD5 | 8d9115b2fb3381920a9ac4b08a691b98 |
|
BLAKE2b-256 | 8676498510c34a3323bd0625e58481241bacc9bc4ec7d0d9309327e64afd0457 |