Skip to main content

Accessioning tool to submit genomics pipeline outputs to the ENCODE Portal

Project description

# accession
Python module and command line tool to submit genomics pipeline analysis output files and metadata to the ENCODE Portal

Table of Contents
=================

* [Installation](#installation)
* [Setting environmental variables](#setting-environmental-variables)
* [Usage](#usage)
* [Arguments](#arguments)

# Installation
Install the module with pip:

$ pip install accession

# Setting environmental variables
You will need ENCODE DCC credentials from the ENCODE Portal. Set them in your command line tool like so:

$ export DCC_API_KEY=XXXXXXXX
$ export DCC_SECRET_KEY=yyyyyyyyyyy

You will also need [Google Application Credentials](https://cloud.google.com/video-intelligence/docs/common/auth#set_up_a_service_account) in your environment. Obtain and set your service account credentials:

$ export GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>

# Usage

$ accession --accession-metadata metadata.json \
--accession-steps steps.json \
--server dev
--lab /labs/encode-processing-pipeline/ \
--award U41HG007000

# Arguments
### Metadata JSON
This file is an output of pipeline analysis run. [The example file](https://github.com/ENCODE-DCC/accession/blob/master/tests/data/ENCSR609OHJ_metadata_2reps.json) has metadata on all of the tasks and produced files.
### Accession Steps
The accessioning steps [configuration file](https://github.com/ENCODE-DCC/accession/blob/master/tests/data/atac_input.json) specifies the task and file names in the output metadata JSON. Accessioning code will selectively submit the specified file keys to the ENCODE DCC Portal. Single step is configured in the following way:

{
"dcc_step_version": "/analysis-step-versions/kundaje-lab-atac-seq-trim-align-filter-step-v-1-0/",
"dcc_step_run": "atac-seq-trim-align-filter-step-run-v1",
"wdl_task_name": "filter",
"wdl_files": [
{
"filekey": "nodup_bam",
"output_type": "alignments",
"file_format": "bam",
"quality_metrics": ["cross_correlation", "samtools_flagstat"],
"derived_from_files": [{
"derived_from_task": "trim_adapter",
"derived_from_filekey": "fastqs",
"derived_from_inputs": "true"
}]
}
]
}

`dcc_step_version` and `dcc_step_run` must exist on the portal.

`wdl_task_name` is the name of the task that has the files to be accessioned.

`wdl_files` specifies the set of files to be accessioned.

`filekey` is the variable that stores the file path in the metadata file.

`output_type`, `file_format`, and `file_format_type` are the ENCODE specific metadata that are required by the Portal

`quality_metrics` is a list of methods that will be called in the code to attach quality metrics to a file

`possible_duplicate` specifies that there could be, in the case of optimal IDR peaks and conservative IDR peaks, files can have an identical content. If the possible duplicate flag is set, file is not accessioned.

`derived_from_files` specifies the list of files the current file being accessioned derives from. These files need to have been accessioned before the current file can be submitted.

`derived_from_inputs` is used when indicating that parent files were not produced during the task execution. Raw fastqs and genome references are examples of such files.

`derived_from_output_type` is required in the case the parent file has a possible duplicate.

### Server
`prod` and `dev` indicates the server where the files are being accessioned to. `dev` points to test.encodedcc.org. The server parameter can be explicitly passed as test.encodedcc.org or encodeproject.org.

### Lab and Award
These are unique identifiers that are expected to be already present on the ENCODE Portal.




Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accession-0.0.7.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

accession-0.0.7-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file accession-0.0.7.tar.gz.

File metadata

  • Download URL: accession-0.0.7.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.1

File hashes

Hashes for accession-0.0.7.tar.gz
Algorithm Hash digest
SHA256 bf0d30d72f416ce9863c0dd3ba90b0257bb1459fea997a656bff768bf1053638
MD5 c7bcb6b2fec37eb766fca2b19c4a14d7
BLAKE2b-256 c1fddc71204c2a95d23649ee336dd9f3fe32882b217cb9286fe803fcdcb16363

See more details on using hashes here.

File details

Details for the file accession-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: accession-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.1

File hashes

Hashes for accession-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1eb4d27eace32b9fa59188ad0ed810676eee5fef756103e1d48d57fb5d3aab8d
MD5 64d13f04694d21fbc2dfbbe20763b97c
BLAKE2b-256 5cad20f175f9f8b1920d2c278194571ed8cdff7bf2e0e3895445e022b28ff42c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page