Skip to main content

A micro service framework for data pipelines, providingscheduling, retrying, and error reporting.

Project description

https://img.shields.io/pypi/v/mettle.svg https://img.shields.io/pypi/pyversions/mettle.svg https://img.shields.io/pypi/dm/mettle.svg https://img.shields.io/travis/yougov/mettle/master.svg

Mettle is a framework for managing extract/transform/load (ETL) jobs. ETL processes present a number of problems that Mettle is designed to solve:

License

License is indicated in the project metadata (typically one or more of the Trove classifiers). For more details, see this explanation.

Description

  • Jobs need to be run at specific times. Sometimes they need to be triggered by the completion of other jobs. Mettle supports scheduling both time-based and trigger-based jobs.

  • Various people in an organization need to be able to see job schedules and the state of recent runs. Naive scripts running on cron jobs, scattered amongst a large number of servers, create a serious problem with visibility. Mettle solves this by centralizing the job scheduling, state reporting, and log viewing.

  • Sometimes jobs fail because of temporary problems somewhere (a flaky network, a too-full disk). Mettle will automatically retry jobs to deal with this.

  • Sometimes jobs fail and will not be able to succeed until the job has been reconfigured (a changed password on a database, for example). Mettle makes it easy to manually re-launch a job after such issues have been resolved.

  • If you try to solve the above problems by centralizing all your ETL execution, you quickly run into a problem of proliferating dependencies. A centralized ETL service can become hard to develop and hard to deploy because all those dependencies (libraries, external APIs, external databases) introduce more instability. Mettle is designed to isolate those dependencies into separate ETL services, so instability in one ETL doesn’t impact any others.

We picked the name “Mettle” because:

  • It’s got the letters E, T, and L in it.

  • It means “ability to continue despite difficulties”.

  • It sounds like “metal”, which is solid.

Mettle is comprised of several components:

  • Web UI. Features:
    • Configure schedules for pipelines.

    • Display past jobs, both successful and failed.

    • Display currently-executing jobs, with live status updates and streaming logs.

    • Manually launch jobs.

  • Timer: Reads pipeline schedules from the database and sends out RabbitMQ messages when pipelines need to be kicked off.

  • Dispatcher: Records which jobs are being executed by which workers, and their eventual success or failure.

  • Logger: Receives log messages sent from ETL Services over RabbitMQ, and saves them to Postgres.

  • ETL Services: Implement the actual business logic and systems integration to move data between systems.

Mettle uses Postgres to store state, and RabbitMQ for inter-process communication.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mettle-0.7.13.tar.gz (306.4 kB view details)

Uploaded Source

Built Distribution

mettle-0.7.13-py2.py3-none-any.whl (343.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mettle-0.7.13.tar.gz.

File metadata

  • Download URL: mettle-0.7.13.tar.gz
  • Upload date:
  • Size: 306.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mettle-0.7.13.tar.gz
Algorithm Hash digest
SHA256 9cd3cdf885b489e450df86a7e7f82c96dde17936c2d0742ea9b49bef2a257b2e
MD5 1f5b5c71d9cf4e6bd68a3f54b5b2a8c6
BLAKE2b-256 9e670ecd8522179331a16c0b99f211e0d2fec58417ec32d7a0ec68ba6b528d49

See more details on using hashes here.

File details

Details for the file mettle-0.7.13-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for mettle-0.7.13-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b7a826986930214280ebdd586b8c46143c17dd59b57f2c938acb566d5ae8e316
MD5 220f423fe6ecd0a2c819d9ec565d8240
BLAKE2b-256 512305fbe85a98f6f30020225484b08f716db59c30319ab13c2105be4d98d424

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page