Skip to main content

Simple data loader for CGP HCA Data Store

Project description

cgp-dss-data-loader

Simple data loader for CGP HCA Data Store

Common Setup

  1. (optional) We recommend using a Python 3 virtual environment.

  2. Run:

    pip3 install cgp-dss-data-loader

setup for development

  1. clone the repo:

    git clone https://github.com/databiosphere/cgp-dss-data-loader.git

  2. go to the root directory of the cloned project:

    cd cgp-dss-data-loader

  3. make sure you are on the branch develop.

  4. run (ideally in a new virtual environment):

    make develop

running tests

run:

make test

getting data from gen3 and loading it

  1. the first step is to extract the gen3 data you want using the sheepdog exporter. the topmed public data extracted from sheepdog is available on the release page under assets. assuming you use this data, you will now have a file called topmed-public.json

  2. make sure you are running the virtual environment you set up in the setup instructions.

  3. now we need to transform the data. we can transform to the outdated gen3 format, or to the new standard format.

    • for the standard format, follow instructions at newt-transformer.

    • for the old gen3 format from the root of the project run:

      python transformer/gen3_transformer.py /path/to/topmed_public.json --output-json transformed-topmed-public.json
      
  4. now that we have our new transformed output we can run it with the loader.

    if you used the standard transformer use the command:

    python scripts/cgp_data_loader.py --no-dry-run --dss-endpoint my_dss_endpoint --staging-bucket name_of_my_s3_bucket standard --json-input-file transformed-topmed-public.json
    

    otherwise for the outdated gen3 format run:

    python scripts/cgp_data_loader.py --no-dry-run --dss-endpoint MY_DSS_ENDPOINT --staging-bucket NAME_OF_MY_S3_BUCKET gen3 --json-input-file transformed-topmed-public.json
    
  5. You did it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cgp-dss-data-loader-0.0.1.tar.gz (14.3 kB view details)

Uploaded Source

File details

Details for the file cgp-dss-data-loader-0.0.1.tar.gz.

File metadata

File hashes

Hashes for cgp-dss-data-loader-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1f3bd3fa8fb338f13b384640267f5d2d94d4ad848606d92b5de62aa4ff39110f
MD5 ce0666f4b4d8c79448dc291cfd829f71
BLAKE2b-256 34aa4851d0f5d35216912ceb418a8bb704382d4cc8da3b0bc207f0318ebce9fc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page