Skip to main content

A platform for describing, extracting, transforming, loading and serving open data.

Project description

Spinta is a framework to describe, extract and publish data (DEP Framework). It supports a great deal of data schemes and formats.

https://gitlab.com/atviriduomenys/spinta/badges/master/pipeline.svg https://gitlab.com/atviriduomenys/spinta/badges/master/coverage.svg

Purpose

  • Describe your data: You can automatically generate data structure description table (Manifest) from many different data sources.

  • Extract your data: Once you have your data structure in Manifest tables, you can extract data from multiple external data sources. Extracted data are validated and transformed using rules defined in Manifest table. Finally, data can be stored into internal database in order to provide fast and flexible access to data.

  • Publish your data: Once you have your data loaded into internal database, you can publish data using API. API is generated automatically using Manifest tables and provides extracted data in many different formats. For example if original data source was a CSV file, now you have a flexible API, that can talk JSON, RDF, SQL, CSV and other formats.

Features

  • Simple 15 column table format for describing data structures (you can use any spreadsheet software to manage metadata of your data)

  • Internal data storage with pluggable backends (PostgreSQL or Mongo)

  • Build-in async API server built on top of Starlette for data publishing

  • Simple web based data browser.

  • Convenient command-line interface

  • Public or restricted API access via OAuth protocol using build-in access management.

  • Simple DSL for querying, transforming and validating data.

  • Low memory consumption for data of any size

  • Support for many different data sources

  • Advanced data extraction even from dynamic API.

  • Compatible with DCAT and Frictionless Data Specifications.

Example

If you have an SQLite database:

$ sqlite3 sqlite.db <<EOF
CREATE TABLE COUNTRY (
    NAME TEXT
);
EOF

You can get a limited API and simple web based data browser with a single command:

$ spinta run -r sql sqlite:///sqlite.db

Then you can generate metadata table (I call it manifest) like this:

$ spinta inspect -r sql sqlite:///sqlite.db
d | r | b | m | property | type   | ref | source              | prepare | level | access | uri | title | description
dataset                  |        |     |                     |         |       |        |     |       |
  | sql                  | sql    |     | sqlite:///sqlite.db |         |       |        |     |       |
                         |        |     |                     |         |       |        |     |       |
  |   |   | Country      |        |     | COUNTRY             |         |       |        |     |       |
  |   |   |   | name     | string |     | NAME                |         | 3     | open   |     |       |

Generated data structure table can be saved into a CSV file:

$ spinta inspect -r sql sqlite:///sqlite.db -o manifest.csv

Missing peaces in metadata can be filled using any Spreadsheet software.

Once you done editing metadata, you can test it via web based data browser or API:

$ spinta run --mode external manifest.csv

Once you are satisfied with metadata, you can generate a new metadata table for publishing, removing all traces of original data source:

$ spinta copy --no-source --access open manifest.csv manifest-public.csv

Now you have matadata for publishing, but all things about original data source are gone. In order to publish data, you need to copy external data to internal data store. To do that, first you need to initialize internal data store:

$ spinta config add backend my_backend postgresql postgresql://localhost/db
$ spinta config add manifest my_manifest tabular manifest-public.csv
$ spinta migrate

Once internal database is initialized, you can push external data into it:

$ spinta push --access open manifest.csv

And now you can publish data via full featured API with a web based data browser:

$ spinta run

You can access your data like this:

$ http :8000/dataset/sql/Country
HTTP/1.1 200 OK
content-type: application/json

{
    "_data": [
        {
            "_type": "dataset/sql/Country",
            "_id": "abdd1245-bbf9-4085-9366-f11c0f737c1d",
            "_rev": "16dabe62-61e9-4549-a6bd-07cecfbc3508",
            "_txn": "792a5029-63c9-4c07-995c-cbc063aaac2c",
            "name": "Vilnius"
        }
    ]
}

$ http :8000/dataset/sql/Country/abdd1245-bbf9-4085-9366-f11c0f737c1d
HTTP/1.1 200 OK
content-type: application/json

{
    "_type": "dataset/sql/Country",
    "_id": "abdd1245-bbf9-4085-9366-f11c0f737c1d",
    "_rev": "16dabe62-61e9-4549-a6bd-07cecfbc3508",
    "_txn": "792a5029-63c9-4c07-995c-cbc063aaac2c",
    "name": "Vilnius"
}

$ http :8000/dataset/sql/Country/abdd1245-bbf9-4085-9366-f11c0f737c1d?select(name)
HTTP/1.1 200 OK
content-type: application/json

{
    "name": "Vilnius"
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spinta-0.1.18.tar.gz (201.9 kB view details)

Uploaded Source

Built Distribution

spinta-0.1.18-py3-none-any.whl (295.3 kB view details)

Uploaded Python 3

File details

Details for the file spinta-0.1.18.tar.gz.

File metadata

  • Download URL: spinta-0.1.18.tar.gz
  • Upload date:
  • Size: 201.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.10.49-1-MANJARO

File hashes

Hashes for spinta-0.1.18.tar.gz
Algorithm Hash digest
SHA256 2c8579a88220622ed86d9b3b7ca4e948dc72c6688e7671f370b0c397c8e7a913
MD5 81c8953ad10a0c974c898fbe6a2d38ae
BLAKE2b-256 1aac9ee13027e3c3bf61ced44afb3c25db63477cb64ad3987e6c789e6d7f5554

See more details on using hashes here.

File details

Details for the file spinta-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: spinta-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 295.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.10.49-1-MANJARO

File hashes

Hashes for spinta-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 6a26ac1e6612c1b300656dbf371b57d7c9fae8f86a9c55043262f3ee6758a2d2
MD5 22c74992c49b30d26bfa7e18e340d59f
BLAKE2b-256 64765aed4b71870a8a9835a55e7fd0c78cd43927005518e061bf104ad93db995

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page