Skip to main content

Python package to quickly download genomes from the UCSC.

Project description

Travis CI build SonarCloud Quality SonarCloud Maintainability Codacy Maintainability Maintainability Pypi project Pypi total project downloads

Python package to quickly download and work with genomes from the UCSC.

How do I install this package?

As usual, just download it using pip:

pip install ucsc_genomes_downloader

Tests Coverage

Since some software handling coverages sometime get slightly different results, here’s three of them:

Coveralls Coverage SonarCloud Coverage Code Climate Coverate

Usage examples

Simply instanziate a new genome

from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")

Downloading selected chromosomes

from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19", chromosomes=["chr1", "chr2"])

Getting gaps regions

The method return a DataFrame in bed-like format that contains the regions where only n or N nucleotides are present.

all_gaps = hg19.gaps() # Returns gaps (region formed of Ns) for all chromosomes
# Returns gaps for chromosome chrM
chrM_gaps = hg19.gaps(chromosomes=["chrM"])

Getting filled regions

The method return a DataFrame in bed-like format that contains the regions where no unknown nucleotide is present, basically the complementary of the gaps method.

all_filled = hg19.filled() # Returns filled for all chromosomes
# Returns filled for chromosome chrM
chrM_filled = hg19.filled(chromosomes=["chrM"])

Removing genome’s cache

hg19.delete()

Utilities

Retrieving a list of the available genomes

You can get a complete list of the genomes available from the UCSC website with the following method:

from ucsc_genomes_downloader.utils import get_available_genomes
all_genomes = get_available_genomes()

Tasselizing bed files

Create a tasselization of a given size of a given bed-like pandas dataframe.

Available alignment are to the left, right or center.

from ucsc_genomes_downloader.utils import tasselize_bed
import pandas as pd

my_bed = pd.read_csv("path/to/my/file.bed", sep="\t")
tasselized = tasselize_bed(
    my_bed,
    window_size=200,
    alignment="left"
)

Expand bed files regions

Expand a given dataframe in bed-like format using selected alignment.

Available alignment are to the left, right or center.

from ucsc_genomes_downloader.utils import expand_bed_regions
import pandas as pd

my_bed = pd.read_csv("path/to/my/file.bed", sep="\t")
expanded = expand_bed_regions(
    my_bed,
    window_size=1000,
    alignment="left"
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucsc_genomes_downloader-1.1.5.tar.gz (11.4 kB view details)

Uploaded Source

File details

Details for the file ucsc_genomes_downloader-1.1.5.tar.gz.

File metadata

  • Download URL: ucsc_genomes_downloader-1.1.5.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ucsc_genomes_downloader-1.1.5.tar.gz
Algorithm Hash digest
SHA256 a4fc198fa5fe8067238c75c441db348b25ef60079c276b91fe25fe6dfb6bdbdb
MD5 fed3e02c04e54142b2f03b158cb42e28
BLAKE2b-256 a9b32b88a5bdc11ec820e7e42e02b3401ba74c10bc6b8f33433480e9d73bc9fc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page