Command Line Interface to upload data to the European Nucleotide Archive
Project description
ENA upload tool
About
The program submits experimental data and respective metadata to the European Nucleotide Archive (ENA). The metadata should be provided in separate tables corresponding to the following ENA objects:
- STUDY
- SAMPLE
- EXPERIMENT
- RUN
The program to perform the following actions:
- add: add an object to the archive
- modify: modify an object in the archive
- cancel: cancel a private object and its dependent objects (under development)
- release: release a private object immediately to the public (under development)
After a successful submission, new tsv tables will be generated with the ENA accession numbers filled in along with a submission receipt.
Tool dependencies
- python 3.5+ including following packages:
- Genshi
- lxml
- pandas
- requests
Installation
pip install ena-upload-cli
Usage
Minimal: ena-upoad-cli --action {add,modify,cancel,release} --center CENTER_NAME --secret SECRET
All supported arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--action {add,modify,cancel,release}
add: add an object to the archive
modify: modify an object in the archive
cancel: cancel a private object and its dependent objects
release: release a private object immediately to public
--study STUDY table of STUDY object
--sample SAMPLE table of SAMPLE object
--experiment EXPERIMENT
table of EXPERIMENT object
--run RUN table of RUN object
--data [FILE [FILE ...]]
data for submission
--center CENTER_NAME specific to your Webin account
--tool TOOL_NAME Specify the name of the tool this submission is done with. Default: ena-upload-cli
--tool_version TOOL_VERSION
Specify the version of the tool this submission is done with. Default: current version of tool
--secret SECRET .secret file containing the password of your Webin account
-d, --dev Flag to use the dev/sandbox endpoint of ENA.
--vir Flag to use the viral sample template.
Mandatory arguments: --action, --center and --secret.
ENA Webin
A Webin can be made here if you don't have one already. The --webin_id parameter makes use of the full username looking like: Webin-XXXXX
. Visit Webin online to check on your submissions or dev Webin to check on test submissions.
The .secret.yml file
To avoid exposing your credentials through the terminal history, it is recommended to make use of a .secret.yml
file, containing your password and username keywords. An example is given in the root of this directory.
Dev instance
By default the submission will be done using following url to ENA: https://www.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA
Use the --dev flag if you want to do a test submission using the tool by the sandbox dev instance of ENA: https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA. A TEST submission will be discarded within 24 hours.
Supported columns for viral sample submissions
Viral samples are validated by ENA using the ENA virus pathogen checklist. The columns supported in the sample tsv table used by this tool are:
Column name | ENA field name | Field format | Cardinality |
---|---|---|---|
alias | alias | free text | mandatory |
status | auto_filled | ||
accession | accession | auto_filled | |
title | TITLE | free text | mandatory |
scientific_name | SCIENTIFIC_NAME | free text | mandatory |
taxon_id | TAXON_ID | auto_filled | |
sample_description | DESCRIPTION | free text | mandatory |
submission_date | auto_filled | ||
geographic_location | geographic location (country and/or sea) | text choice | mandatory |
host_common_name | host common name | free text | mandatory |
host_subject_id | host subject id | free text | mandatory |
host_health_state | host health state | text choice | mandatory |
host_sex | host sex | text choice | mandatory |
host_scientific_name | host scientific name | free text | mandatory |
collector_name | collector name | free text | mandatory |
collecting_institution | collecting institution | free text | mandatory |
isolate | isolate | free text | mandatory |
collection_date | collection date | restricted text | recommended |
geographic_location_latitude | geographic location (latitude) | restricted text | recommended |
geographic_location_longitude | geographic location (longitude) | restricted text | recommended |
geographic_location_region | geographic location (region and locality) | free text | recommended |
sample_capture_status | sample capture status | text choice | recommended |
host_disease_outcome | host disease outcome | text choice | recommended |
host_age | host age | restricted text | recommended |
virus_identifier | virus identifier | free text | recommended |
receipt_date | receipt date | restricted text | recommended |
definition_for_seropositive_sample | definition for seropositive sample | free text | recommended |
serotype | serotype (required for a seropositive sample) | free text | recommended |
host_habitat | host habitat | text choice | recommended |
isolation_source_host_associated | isolation source host-associated | free text | recommended |
host_behaviour | host behaviour | text choice | recommended |
isolation_source_non_host_associated | isolation source non-host-associated | free text | recommended |
subject_exposure | subject exposure | free text | optional |
subject_exposure_duration | subject exposure duration | free text | optional |
type_exposure | type exposure | free text | optional |
personal_protective_equipment | personal protective equipment | free text | optional |
hospitalisation | hospitalisation | text choice | optional |
illness_duration | illness duration | free text | optional |
illness_symptoms | illness symptoms | free text | optional |
sample_storage_conditions | sample storage conditions | free text | optional |
strain | strain | free text | optional |
host_description | host description | free text | optional |
gravidity | gravidity | free text | optional |
Please use the ENA virus pathogen checklist on the website of ENA to know which values are allowed/possible in the restricted text
and text choice
fields.
The data files
Supported data
- Read data
- Genome Assembly
- Transcriptome Assembly
- Template Sequence
- Other Analyses
Most files uploaded to the ENA FTP server need to be compressed.
More information on how ENA wants to receive the files can be found here.
Tool overview
inputs:
- metadata tables
- examples in
example_table
- Please define actions in status column e.g.
add
,modify
, cancel, release - to perform bulk submission of all objects, the
aliases ids
in different ENA objects should be in the association where alias ids in experiment object link all objects together
- examples in
- experimental data
- examples in
example_data
- examples in
outputs:
- In the same directory of inputs
- metadata tables with updated info in
status
and other relevant columns, e.g:- updated status:
added
,modified
, canceled, released - accession ids
- submission date
- updated status:
Test the tool
test command: add metadata and sequence data
ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml
test command: modify metadata
ena_upload --action modify --center 'your_center_name' --study example_tables/ENA_template_studies-2020-05-01T1421.tsv --dev --secret .secret.yml
test command for viral data
ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples_vir.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --vir --secret .secret.yml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ena_upload_cli-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f9892927bf011f8c2e7b414f8007a1acfb2a90173e6d9423dbc73157ec753fa |
|
MD5 | 4a987c32f7d5596daa984da01f330b12 |
|
BLAKE2b-256 | 000618362d56aa017831da069eb4947804ab1c3c5800d998073c3fd95e44f498 |