Skip to main content

udata analysis service

Project description

udata-analysis-service

This service's purpose is to analyse udata datalake files to enrich the metadata, starting with CSVs. It uses csv-detective to detect the type and format of CSV columns by checking both headers and contents.

Installation

Install udata-analysis-service:

pip install udata-analysis-service

Rename the .env.sample to .env and fill it with the right values.

REDIS_URL = redis://localhost:6381/0
REDIS_HOST = localhost
REDIS_PORT = 6381
KAFKA_HOST = localhost
KAFKA_PORT = 9092
KAFKA_API_VERSION = 2.5.0
MINIO_URL = https://object.local.dev/
MINIO_USER = sample_user
MINIO_PWD = sample_pwd
ROWS_TO_ANALYSE_PER_FILE=500
CSV_DETECTIVE_REPORT_BUCKET = benchmark-de
CSV_DETECTIVE_REPORT_FOLDER = report
TABLESCHEMA_BUCKET = benchmark-de
TABLESCHEMA_FOLDER = schemas
UDATA_INSTANCE_NAME=udata

Usage

Start the Kafka consumer:

udata-analysis-service consume

Start the Celery worker:

udata-analysis-service work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

udata-analysis-service-0.0.1.dev12.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

udata_analysis_service-0.0.1.dev12-py2.py3-none-any.whl (5.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file udata-analysis-service-0.0.1.dev12.tar.gz.

File metadata

File hashes

Hashes for udata-analysis-service-0.0.1.dev12.tar.gz
Algorithm Hash digest
SHA256 0d84519f82254e3b0dcaa7cfecca6163704b8ee10f8dcfa4f5fad5a054bdc1b9
MD5 a1e25e238498d8a1b7318877eaaacbe9
BLAKE2b-256 a4cc4d7e5628765a7071ef7ff647218d7c20d474da35447b4db9e94fd6ec8140

See more details on using hashes here.

Provenance

File details

Details for the file udata_analysis_service-0.0.1.dev12-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for udata_analysis_service-0.0.1.dev12-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a04b6b0679a47bc7f4445cbfbad96f0c5fb2075f3428eb037cd8041379f7d5cc
MD5 8f68521cfd2264f5e0b14c9d4eddb53b
BLAKE2b-256 e13ab7310478357a1503279496954eeabb2c2a8992590015994bd61b88016258

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page