Skip to main content

Biohackathon sequence uploader

Project description

# Sequence uploader

This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon’s Public Sequence Resource project. There are two versions, one that runs on the command line and another that acts as web interface. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers.

![alt text](./image/website.png “Website”)

To get started, first [install the uploader](#installation), and use the bh20-seq-uploader command to [upload your data](#usage).

# Installation

There are several ways to install the uploader. The most portable is with a [virtualenv](#installation-with-virtualenv).

## Installation with virtualenv

  1. Prepare your system. You need to make sure you have Python, and the ability to install modules such as pycurl and pyopenssl. On Ubuntu 18.04, you can run:

`sh sudo apt update sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev `

  1. Create and enter your virtualenv. Go to some memorable directory and make and enter a virtualenv:

`sh virtualenv --python python3 venv . venv/bin/activate `

Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.

  1. Install the tool. Once in your virtualenv, install this project:

`sh pip3 install git+https://github.com/arvados/bh20-seq-resource.git@master `

  1. Test the tool. Try running:

`sh bh20-seq-uploader --help `

It should print some instructions about how to use the uploader.

Make sure you are in your virtualenv whenever you run the tool! If you ever can’t run the tool, and your prompt doesn’t say (venv), try going to the directory where you put the virtualenv and running . venv/bin/activate. It only works for the current terminal window; you will need to run it again if you open a new terminal.

## Installation with pip3 –user

If you don’t want to have to enter a virtualenv every time you use the uploader, you can use the –user feature of pip3 to install the tool for your user.

  1. Prepare your system. Just as for the virtualenv method, you need to install some dependencies. On Ubuntu 18.04, you can run:

`sh sudo apt update sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev `

  1. Install the tool. You can run:

`sh pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master `

  1. Make sure the tool is on your `PATH`. The pip3 command will install the uploader in .local/bin inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:

`sh export PATH=$PATH:$HOME/.local/bin `

To make this change permanent, assuming your shell is Bash, run:

`sh echo 'export PATH=$PATH:$HOME/.local/bin' >>~/.bashrc `

  1. Test the tool. Try running:

`sh bh20-seq-uploader --help `

It should print some instructions about how to use the uploader.

## Installation from Source for Development

If you plan to contribute to the project, you may want to install an editable copy from source. With this method, changes to the source code are automatically reflected in the installed copy of the tool.

  1. Prepare your system. On Ubuntu 18.04, you can run:

`sh sudo apt update sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev `

  1. Clone and enter the repository. You can run:

`sh git clone https://github.com/arvados/bh20-seq-resource.git cd bh20-seq-resource `

  1. Create and enter a virtualenv. Go to some memorable directory and make and enter a virtualenv:

`sh virtualenv --python python3 venv . venv/bin/activate `

Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.

  1. Install the checked-out repository in editable mode. Once in your virtualenv, install with this special pip command:

`sh pip3 install -e . `

  1. Test the tool. Try running:

`sh bh20-seq-uploader --help `

It should print some instructions about how to use the uploader.

## Installation with GNU Guix

For running/developing the uploader with GNU Guix see [INSTALL.md](./doc/INSTALL.md)

# Usage

Run the uploader with a FASTA or FASTQ file and accompanying metadata file in JSON or YAML:

`sh bh20-seq-uploader example/sequence.fasta example/metadata.yaml `

## Workflow for Generating a Pangenome

All these uploaded sequences are being fed into a workflow to generate a [pangenome](https://academic.oup.com/bib/article/19/1/118/2566735) for the virus. You can replicate this workflow yourself.

An example is to get your SARS-CoV-2 sequences from GenBank in seqs.fa, and then run a series of commands

`sh minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf seqwish -s seqs.fa -p seqs.paf -g seqs.gfa odgi build -g seqs.gfa -s -o seqs.odgi odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5 `

Here we convert such a pipeline into the Common Workflow Language (CWL) and sources can be found [here](https://github.com/hpobio-lab/viral-analysis/tree/master/cwl/pangenome-generate).

For more information on building pangenome models, [see this wiki page](https://github.com/virtual-biohackathons/covid-19-bh20/wiki/Pangenome#pangenome-model-from-available-genomes).

# Web Interface

This project comes with a simple web server that lets you use the sequence uploader from a browser. It will work as long as you install the packager with the web extra.

To run it locally:

` virtualenv --python python3 venv . venv/bin/activate pip install -e ".[web]" env FLASK_APP=bh20simplewebuploader/main.py flask run `

Then visit [http://127.0.0.1:5000/](http://127.0.0.1:5000/).

## Production

For production deployment, you can use [gunicorn](https://flask.palletsprojects.com/en/1.1.x/deploying/wsgi-standalone/#gunicorn):

` pip3 install gunicorn gunicorn bh20simplewebuploader.main:app `

This runs on [http://127.0.0.1:8000/](http://127.0.0.1:8000/) by default, but can be adjusted with various [gunicorn options](http://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bh20-seq-uploader-1.0.20200429183028.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file bh20-seq-uploader-1.0.20200429183028.tar.gz.

File metadata

  • Download URL: bh20-seq-uploader-1.0.20200429183028.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3

File hashes

Hashes for bh20-seq-uploader-1.0.20200429183028.tar.gz
Algorithm Hash digest
SHA256 c0cac91453cf6dc53025cee94010c8cf592e461e74170d3b08c71a05f6d26734
MD5 3224abbbc9a3d0275c5cc1f1648860fc
BLAKE2b-256 21c24cca5695916099986fe7b9910a17796ac2588ae7895fd8d4f07c481bc3aa

See more details on using hashes here.

File details

Details for the file bh20_seq_uploader-1.0.20200429183028-py3-none-any.whl.

File metadata

  • Download URL: bh20_seq_uploader-1.0.20200429183028-py3-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3

File hashes

Hashes for bh20_seq_uploader-1.0.20200429183028-py3-none-any.whl
Algorithm Hash digest
SHA256 69c516752669a40611d1ca051c4b8477d51ea49d31badb2d7eb24e16d32c5703
MD5 7edc5b62ec1e030a90a12e43d43a211f
BLAKE2b-256 a7969533b2ae056121fab2b8408092461e24906b0d964128af2570af41c02810

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page