Skip to main content

python tools to assist with standardized data ingestion workflows for the OS-Climate project

Project description

osc-ingest-tools

python tools to assist with standardized data ingestion workflows

Install from PyPi

pip install osc-ingest-tools

Examples

>>> from osc_ingest_trino import *

>>> import pandas as pd

>>> data = [['tom', 10], ['nick', 15], ['juli', 14]]

>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years']).convert_dtypes()

>>> df
  First Name  Age In Years
0        tom            10
1       nick            15
2       juli            14

>>> enforce_sql_column_names(df)
  first_name  age_in_years
0        tom            10
1       nick            15
2       juli            14

>>> enforce_sql_column_names(df, inplace=True)

>>> df
  first_name  age_in_years
0        tom            10
1       nick            15
2       juli            14

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   first_name    3 non-null      string
 1   age_in_years  3 non-null      Int64 
dtypes: Int64(1), string(1)
memory usage: 179.0 bytes

>>> p = create_table_schema_pairs(df)

>>> print(p)
    first_name varchar,
    age_in_years bigint

>>> 

Adding custom type mappings to create_table_schema_pairs

>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years'])

>>> enforce_sql_column_names(df, inplace=True)

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   first_name    3 non-null      object
 1   age_in_years  3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes

>>> p = create_table_schema_pairs(df, typemap={'object':'varchar'})

>>> print(p)
    first_name varchar,
    age_in_years bigint

>>>

build and upload a new release

  • update all occurrences of __version__
  • python3 setup.py clean
  • python3 setup.py sdist
  • twine check dist/*
  • twine upload dist/*
  • push latest to repo
  • create new release on github

python packaging resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc-ingest-tools-0.2.0.tar.gz (5.6 kB view details)

Uploaded Source

File details

Details for the file osc-ingest-tools-0.2.0.tar.gz.

File metadata

  • Download URL: osc-ingest-tools-0.2.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for osc-ingest-tools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d35073a77596c8640878444f6fb6e9a67eb7e10dad356f45f994457f7fc8a8aa
MD5 8083ca49d874afc82645ce7165e1f773
BLAKE2b-256 ea6d96eaa45809b8d4fd459d1a269c7ed35704e34f422e3939fdac2373e65fb0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page