Skip to main content

A light-weight analytical engine for OLAP processing

Project description

Babbage Analytical Engine

Gitter Build Status Coverage Status

babbage is a lightweight implementation of an OLAP-style database query tool for PostgreSQL. Given a database schema and a logical model of the data, it can be used to perform analytical queries against that data - programmatically or via a web API.

It is heavily inspired by Cubes but has less ambitious goals, i.e. no pre-computation of aggregates, or multiple storage backends.

babbage is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).

Installation and test

babbage will normally included as a PyPI dependency, or installed via pip:

$ pip install babbage

People interested in contributing to the package should instead check out the source repository and then use the provided Makefile to install the library (this requires virtualenv to be installed):

$ git clone https://github.com/openspending/babbage.git
$ cd babbage
$ make install
$ pip install tox
$ export BABBAGE_TEST_DB=postgresql://postgres@localhost:5432/postgres
$ make test

Usage

babbage is used to query a set of existing database tables, using an abstract, logical model to query them. A sample of a logical model can be found in tests/fixtures/models/cra.json, and a JSON schema specifying the model is available in babbage/schema/model.json.

The central unit of babbage is a Cube, i.e. a OLAP cube that uses the provided model metadata to construct queries against a database table. Additionally, the application supports managing multiple cubes at the same time via a CubeManager, which can be subclassed to enable application-specific ways of defining cubes and where their metadata is stored.

Futher, babbage includes a Flask Blueprint that can be used to expose a standard API via HTTP. This API is consumed by the JavaScript babbage.ui package and it is very closely modelled on the Cubes and OpenSpending HTTP APIs.

Programmatic usage

Let's assume you have an existing database table of procurement data and want to query it using babbage in a Python shell. A session might look like this:

import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure

engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))

cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)

# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201

# There's a field called 'total_value':
assert 'total_value' in facts['fields']

# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'

# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0

# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']

# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
                           drilldowns='supplier|authority'
                           cut='year:2015|authority.country:GB',
                           page_size=500)
# This translates to: 
#   Aggregate the procurement data by summing up the 'total_value'
#   for each unique pair of values in the 'supplier' and 'authority'
#   dimensions, and filter for only those entries where the 'year'
#   dimensions key attribute is '2015' and the 'authority' dimensions
#   'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0

# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0

Using the HTTP API

The HTTP API for babbage is a simple Flask Blueprint used to expose a small set of calls that correspond to the cube functions listed above. To include it into an existing Flask application, you would need to create a CubeManager and then configure the API like this:

from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api

app = Flask('demo')
engine = 
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')

app.run()

Of course, you can define your own CubeManager, for example if you wish to retrieve model metadata from a database.

When enabled, the API will expose a number of JSON(P) endpoints relative to the given url_prefix:

  • /, returns the system status and version.
  • /cubes, returns a list of the available cubes (name only).
  • /cubes/<name>/model, returns full metadata for a given cube (i.e. measures, dimensions, aggregates etc.)
  • /cubes/<name>/facts is used to return individual entries from the cube in a non-aggregated form. Supports filters (cut), a set of fields to return and a sort (field_name:direction), as well as page and page_size.
  • /cubes/<name>/members is used to return the distinct set of values for a given dimension, e.g. all the suppliers mentioned in a procurement dataset. Supports filters (cut), a and a sort (field_name:direction), as well as page and page_size.
  • /cubes/<name>/aggregate is the main endpoint for generating aggregate views of the data. Supports specifying the aggregates to include, the drilldowns to aggregate by, a set of filters (cut), a and a sort (field_name:direction), as well as page and page_size.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babbage-0.4.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

babbage-0.4.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file babbage-0.4.0.tar.gz.

File metadata

  • Download URL: babbage-0.4.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for babbage-0.4.0.tar.gz
Algorithm Hash digest
SHA256 27ff42e17bc8a023ae03c36fd51838e8fe54f21809add2ecacf4c00aa32cddaa
MD5 45322fe702be3da91e522d7cb1d755c6
BLAKE2b-256 5093da4c0f8fb1958122ff4a097596207f1d2902b0365740c7169b59cb157433

See more details on using hashes here.

File details

Details for the file babbage-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: babbage-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for babbage-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2505c0c3445c4c25e1b0ad39ec2e5e5384e1e0afa2b828a49600577d0901feb8
MD5 93e261d0af5eff4682fb35a42d0c8686
BLAKE2b-256 ea79a6b8bbf3a61f418912638da8327ae69d4cb576a667e30e35faac32fb030a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page