Skip to main content

A utility library for working with data flows in Python and ElasticSearch

Project description

dataflows-elasticsearch

Travis Coveralls

Dataflows's processors to work with ElasticSearch

Features

  • dump_to_elasticsearch processor

Contents

Getting Started

Installation

The package use semantic versioning. It means that major versions could include breaking changes. It's recommended to specify package version range in your setup/requirements file e.g. package>=1.0,<2.0.

$ pip install dataflows-elasticsearch

Examples

These processors have to be used as a part of a dataflows Flow. For example:

flow = Flow(
    load('data/data.csv'),
    dump_to_es(
        engine='localhost:9200',
    ),
)
flow.process()

Documentation

dump_to_es

Saves the Flow to an ElasticSearch Index.

Parameters

  • indexes - Mapping of indexe names to resource names, e.g.
{
  'index-name-1': {
    'resource-name': 'resource-name-1',
  },
  'index-name-2': {
    'resource-name': 'resource-name-2',
  },
  # ...
}
  • mapper_cls - Class to be used to map json table schema types into ElasticSearch types
  • index_settings - Options to be used when creating the ElasticSearch index
  • engine - Connection string for connecting the ElasticSearch instance, or an Elasticsearch object. Can also be of the form env://ENV_VAR, in which case the connection string will be fetched from the environment variable ENV_VAR.
  • elasticsearch_options - Options to be used when creating the Elasticsearch object (in case it wasn't provided)

Contributing

The project follows the Open Knowledge International coding standards.

The recommended way to get started is to create and activate a project virtual environment. To install package and development dependencies into your active environment:

$ make install

To run tests with linting and coverage:

$ make test

For linting, pylama (configured in pylama.ini) is used. At this stage it's already installed into your environment and could be used separately with more fine-grained control as described in documentation - https://pylama.readthedocs.io/en/latest/.

For example to sort results by error type:

$ pylama --sort <path>

For testing, tox (configured in tox.ini) is used. It's already installed into your environment and could be used separately with more fine-grained control as described in documentation - https://testrun.org/tox/latest/.

For example to check subset of tests against Python 2 environment with increased verbosity. All positional arguments and options after -- will be passed to py.test:

tox -e py37 -- -v tests/<path>

Under the hood tox uses pytest (configured in pytest.ini), coverage and mock packages. These packages are available only in tox envionments.

Changelog

The full changelog and documentation for all released versions can be found in the nicely formatted commit history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflows-elasticsearch-0.1.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

dataflows_elasticsearch-0.1.1-py2.py3-none-any.whl (5.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dataflows-elasticsearch-0.1.1.tar.gz.

File metadata

  • Download URL: dataflows-elasticsearch-0.1.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for dataflows-elasticsearch-0.1.1.tar.gz
Algorithm Hash digest
SHA256 56ca064baea3453f7d5e497531c20bd6afdfbc89b5ce846b74f154f2fef4dd38
MD5 6bee4d829377cef6c1a0e21cac7c9fb8
BLAKE2b-256 6e6ad2ddf5cc72d204f9d960ded7e44be44fafd7cc08e01fbad0e843b4a2751a

See more details on using hashes here.

File details

Details for the file dataflows_elasticsearch-0.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: dataflows_elasticsearch-0.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for dataflows_elasticsearch-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 002578ed127c1ac2e357bbb3f524cd62b79c50424972efbc203710e554a9911c
MD5 504f62a75680b0b664f0a7ac22de2eee
BLAKE2b-256 c362914c1b0554eff8192b9f69b7f312e9077d107d167070b9fa9f312945f331

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page