udata analysis service

Project description

udata-analysis-service

This service's purpose is to analyse udata datalake files to enrich the metadata, starting with CSVs. It uses csv-detective to detect the type and format of CSV columns by checking both headers and contents.

Installation

Install udata-analysis-service:

pip install udata-analysis-service

Rename the .env.sample to .env and fill it with the right values.

REDIS_URL = redis://localhost:6381/0
REDIS_HOST = localhost
REDIS_PORT = 6381
KAFKA_HOST = localhost
KAFKA_PORT = 9092
KAFKA_API_VERSION = 2.5.0
MINIO_URL = https://object.local.dev/
MINIO_USER = sample_user
MINIO_PWD = sample_pwd
ROWS_TO_ANALYSE_PER_FILE=500
CSV_DETECTIVE_REPORT_BUCKET = benchmark-de
CSV_DETECTIVE_REPORT_FOLDER = report
TABLESCHEMA_BUCKET = benchmark-de
TABLESCHEMA_FOLDER = schemas
UDATA_INSTANCE_NAME=udata

Usage

Start the Kafka consumer:

udata-analysis-service consume

Start the Celery worker:

udata-analysis-service work

Logging & Debugging

The log level can be adjusted using the environment variable LOGLEVEL. For example, to set the log level to DEBUG when consuming Kafka messages, use LOGLEVEL="DEBUG" udata-analysis-service consume.

Project details

Release history Release notifications | RSS feed

0.0.1.dev53 pre-release

Aug 26, 2022

This version

0.0.1.dev38 pre-release

Aug 1, 2022

0.0.1.dev34 pre-release

Jul 28, 2022

0.0.1.dev27 pre-release

Jul 18, 2022

0.0.1.dev24 pre-release

Jul 13, 2022

0.0.1.dev12 pre-release

Jul 1, 2022

0.0.1.dev6 pre-release

Jun 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

udata-analysis-service-0.0.1.dev38.tar.gz (6.5 kB view hashes)

Uploaded Aug 1, 2022 Source

Built Distribution

udata_analysis_service-0.0.1.dev38-py2.py3-none-any.whl (5.8 kB view hashes)

Uploaded Aug 1, 2022 Python 2 Python 3

Hashes for udata-analysis-service-0.0.1.dev38.tar.gz

Hashes for udata-analysis-service-0.0.1.dev38.tar.gz
Algorithm	Hash digest
SHA256	`a3a8c62f20199eaaec133a048017495684eea154e9da5b1a8dec973c81c21bc2`
MD5	`7259db5075546a836a5f8d18c4548f1e`
BLAKE2b-256	`953e8e49c7af0860c626ae8ac1a258f9035fbd4786beb91d17525b86f948e583`

Hashes for udata_analysis_service-0.0.1.dev38-py2.py3-none-any.whl

Hashes for udata_analysis_service-0.0.1.dev38-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`94682fb2c9749a11973f2ba62c80e85d1cc565e494812320160ca2bbfef5cb8d`
MD5	`5ecfb33a2a90bd160658b8172465ef60`
BLAKE2b-256	`6f222b28af6ff01e71d5d1b993577519edd4fe1b19f5ba6b05bf8bae6b6e139b`