These details have been verified by PyPI

Maintainers

akariv brew okfn vitorbaptista

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
Topic
- Internet :: WWW/HTTP :: Dynamic Content
- Software Development :: Libraries :: Python Modules

Project description

datapackage-pipelines-fiscal

Extension for datapackage-pipelines used for loading Fiscal Data Packages into:

S3 (or compatible) storage, in a denormalized form
a database in normalized form.
Metadata will be stored in an elasticsearch instance (if available), OpenSpending compatible
A babbage model will also be generated for querying the database using its API

This extension works with a custom source spec and a set of processors. The generator will convert the source spec into a set of inter-dependent pipelines, which when run in order will perform data processing and loading to selected endpoints (based on environment variables).

fiscal.source-spec.yaml

Each source-spec contains information regarding a single Fiscal Data Package.

Top level properties are:

`title`

Title, or Display name, of the data package

`dataset-name` [OPTIONAL]

A slug to be used as the data package's name.

If not provided, a slugified version of the title will be used.

`resource-name` [OPTIONAL]

A slug to be used as the main resource's name in the final data package.

If not provided, the dataset name will be used.

`owner-id`

The id of the owner of this datapackage.

This identifier is used to generate various paths and storage names.

`sources`

Contains a non-empty list of data sources for the fiscal data package.

Each data source has these properties:

url: The location of the data
name: [OPTIONAL] A name for this source (will later be used as an intermediate resource name)

Other tabulator parameters can also be added as properties here, e.g. sheet, encoding, compression etc.

`fields`

Contains a non-empty list of fields for the fiscal data package.

Each field definition has these properties:

header: The name of the field in the resulting resource
title [OPTIONAL]: The display name of the field in the resulting resource
columnType: The ColumnType of the field
options: Extra options to be added to the field, e.g. json-table-schema properties such as decimalChar etc.

`measures` [OPTIONAL]

Extra information for measure normalization processing. (Measure normalization is the process of reducing the number of measures to one while multipltying the number of rows and adding extra columns to contain values for identifying the original measure).

Contains the following sub-properties:

currency: The currency code of the output measure column
title [OPTIONAL]: The title for the output measure column
mapping: Unpivoting map.

The unpivoting map is a map from a measure's name to its unpivoting data.

"Unpivoting data" is a map from an extra column's name to a value

Example:

measures:
  currency: GTQ
  mapping:
    APPROVED:
      PHASE_ID: "0"
      PHASE: Inicial
    RELEASED:
      PHASE_ID: "1"
      PHASE: Vigente
    COMMITTED:
      PHASE_ID: "2"
      PHASE: Comprometido

currencies [OPTIONAL]: List of currency codes to convert to ('USD' by default). See next section for details

`currency-conversion` [OPTIONAL]

Instructions for adding an extra column or columns with measure values in another currency.

date_measure [OPTIONAL]: Column name from which a date can be extracted. If not provided, a guess will be made according to the ColumnType.
title [OPTOINAL]: Title for the currency-converted measure columns.

`datapackage-url` [OPTIONAL]

Contains the URL for a source datapackage from which this data came from. If provided, metadata for this datapackage will be loaded from this URL.

`deduplicate` [OPTIONAL]

If true, then the source data will be processed to remove duplicate rows (i.e. rows which have the same values in the primary key). Measure values for these rows will be summed in order to generate a single output row.

`postprocessing` [OPTIONAL]

A list of extra processors (and parameters) that will be applied to the data. Format is as in any pipeline-spec.yaml

Generated Pipelines

./denormalized_flow

Loads external metadata
Collects all data from all sources
Combines different sources onto one unified stream
Does measure normalization
Does currency conversion
Does row deduplication
Does extra processing steps

Outputs:

Denormalized data (local file)
List of fiscal years in a separate resource (local file)
Updates os package registry (if configured)

./finalize_datapackage_flow_splitter

(depends on ./denormalized_flow)

Loads denormalized package
Writes separate per-year filtered copies of the data

./finalize_datapackage_flow

(depends on ./finalize_datapackage_flow_splitter)

Loads all resources from the splitter pipeline as well as the full denormalized dataset

Outputs:

Stores results in S3 bucket
Zip file with the datapackage (in case an S3 bucket is not configured)
Updates os package registry (if configured)

./dimension_flow_{hierarchy}

(depends on ./denormalized_flow)

Loads denormalized data
Picks only hierarchy columns
Add auto-incrementing id column
Remove duplicates

Outputs:

Normalized hierarchy data (local file)

./normalized_flow

(depends on ./denormalized_flow and all ./dimension_flow_{hierarchy})

Loads denormalized data as fact table
Loads all normalized hierarchy data
Creates babbage model
Replaces all hierarchy columns in fact table with corresponding ids from normalized hierarchy tables

Outputs:

Normalized fact table (local file)
Updates os package registry (if configured)

./dumper_flow_{hierarchy}

(depends on corresponding ./dimension_flow_{hierarchy})

Loads normalized hierarchy data
Fixes nulls in primary key (replacing them with empty strings)

Outputs

Saves data as a single table in an SQL database

./dumper_flow

(depends on ./normalized_flow)

Loads normalized fact table data
Fixes nulls in primary key (replacing them with empty strings)

Outputs

Saves data as a single table in an SQL database

./dumper_flow_update_status

(depends on ./dumper_flow)

Outputs

Updates os package registry (if configured) that the package was loaded successfully

Environment variables

DPP_DB_ENGINE - connection string for an SQL database to dump data into

ELASTICSEARCH_ADDRESS [OPTIONAL] - connection string for an elasticsearch instance (used for package registry updating)

S3_BUCKET_NAME [OPTIONAL] - S3 bucket for uploading data. If not provided, local ZIP files will be created instead.

AWS_ACCESS_KEY_ID - S3 credentials (required if S3 bucket was specified)

AWS_SECRET_ACCESS_KEY - S3 credentials (required if S3 bucket was specified)

Dependencies

In order to fully run the fiscal datapackage flow you need to have os-types installed, using npm:

$ npm install -g os-types

This external node.js utility is used to perform fiscal modelling for the processed datapackage.

Contributing

Please read the contribution guideline:

How to Contribute

Project details

These details have been verified by PyPI

Maintainers

akariv brew okfn vitorbaptista

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
Topic
- Internet :: WWW/HTTP :: Dynamic Content
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

1.2.7

Nov 20, 2019

1.2.6

Nov 6, 2019

1.2.5

Oct 30, 2019

1.2.4

Mar 14, 2019

1.2.3

Feb 14, 2019

1.2.0

Nov 21, 2018

1.2.0a0 pre-release

Nov 19, 2018

1.1.4

Oct 23, 2018

1.1.3

Oct 22, 2018

1.1.2

Oct 22, 2018

1.1.1

Oct 18, 2018

1.1.0

Oct 8, 2018

1.1.0a2 pre-release

Oct 4, 2018

1.1.0a0 pre-release

Oct 4, 2018

1.0.22

Aug 1, 2018

1.0.21

Jul 31, 2018

This version

1.0.20

Jul 26, 2018

1.0.19

Jul 23, 2018

1.0.18

Jul 20, 2018

1.0.17

Jul 4, 2018

1.0.16

Jul 3, 2018

1.0.15

May 19, 2018

1.0.14

May 9, 2018

1.0.13

May 8, 2018

1.0.12

May 8, 2018

1.0.11

May 6, 2018

1.0.10

May 6, 2018

1.0.9

Apr 28, 2018

1.0.8

Apr 28, 2018

1.0.7

Mar 12, 2018

1.0.6

Mar 11, 2018

1.0.5

Mar 11, 2018

1.0.4

Mar 11, 2018

1.0.3

Mar 11, 2018

1.0.2

Mar 11, 2018

1.0.1

Mar 11, 2018

1.0.0

Feb 25, 2018

0.1.11

Nov 6, 2017

0.1.10

Nov 6, 2017

0.1.9

Nov 6, 2017

0.1.8

Nov 6, 2017

0.1.7

Nov 2, 2017

0.1.6

Nov 1, 2017

0.1.5

Oct 26, 2017

0.1.4

Oct 26, 2017

0.1.3

Oct 26, 2017

0.1.2

Oct 25, 2017

0.1.1

Oct 5, 2017

0.1.0

Oct 4, 2017

0.0.9

Oct 18, 2017

0.0.3

Nov 10, 2016

0.0.2

Oct 23, 2016

0.0.1

Oct 19, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datapackage-pipelines-fiscal-1.0.20.tar.gz (17.8 kB view hashes)

Uploaded Jul 26, 2018 Source

Built Distribution

datapackage_pipelines_fiscal-1.0.20-py2.py3-none-any.whl (22.1 kB view hashes)

Uploaded Jul 26, 2018 Python 2 Python 3

Hashes for datapackage-pipelines-fiscal-1.0.20.tar.gz

Hashes for datapackage-pipelines-fiscal-1.0.20.tar.gz
Algorithm	Hash digest
SHA256	`928a35d110c9ba3537f75dbeb446adf046be7c2c8ca8f595d399a7b9aefd9d3c`
MD5	`97c81ff00b611adc40c224d7d2309f2d`
BLAKE2b-256	`4c9111c7ed446421f70bfdb51a5b7aa765bfe0adcb882cdeeab28dc4accc3192`

Hashes for datapackage_pipelines_fiscal-1.0.20-py2.py3-none-any.whl

Hashes for datapackage_pipelines_fiscal-1.0.20-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`4dea66e9ace9922b875e6a7f46b3b4a19e398a6f713eeed11686e83628fbdca7`
MD5	`91eee1037c5139f7e1dd9c501006524a`
BLAKE2b-256	`d00bbcd841db712942427c269b85253ff532022acf65178b47a811efd08a62b1`

datapackage-pipelines-fiscal 1.0.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

datapackage-pipelines-fiscal

fiscal.source-spec.yaml

title

dataset-name [OPTIONAL]

resource-name [OPTIONAL]

owner-id

sources

fields

measures [OPTIONAL]

currency-conversion [OPTIONAL]

datapackage-url [OPTIONAL]

deduplicate [OPTIONAL]

postprocessing [OPTIONAL]

Generated Pipelines

./denormalized_flow

./finalize_datapackage_flow_splitter

./finalize_datapackage_flow

./dimension_flow_{hierarchy}

./normalized_flow

./dumper_flow_{hierarchy}

./dumper_flow

./dumper_flow_update_status

Environment variables

Dependencies

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`title`

`dataset-name` [OPTIONAL]

`resource-name` [OPTIONAL]

`owner-id`

`sources`

`fields`

`measures` [OPTIONAL]

`currency-conversion` [OPTIONAL]

`datapackage-url` [OPTIONAL]

`deduplicate` [OPTIONAL]

`postprocessing` [OPTIONAL]