Python package to quickly download genomes from the UCSC.
Project description
Python package to quickly download and work with genomes from the UCSC.
How do I install this package?
As usual, just download it using pip:
pip install ucsc_genomes_downloader
Tests Coverage
Since some software handling coverages sometime get slightly different results, here’s three of them:
Usage examples
Simply instanziate a new genome
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")
Downloading selected chromosomes
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19", chromosomes=["chr1", "chr2"])
Getting gaps regions
The method return a DataFrame in bed-like format that contains the regions where only n or N nucleotides are present.
all_gaps = hg19.gaps() # Returns gaps (region formed of Ns) for all chromosomes
# Returns gaps for chromosome chrM
chrM_gaps = hg19.gaps(chromosomes=["chrM"])
Getting filled regions
The method return a DataFrame in bed-like format that contains the regions where no unknown nucleotide is present, basically the complementary of the gaps method.
all_filled = hg19.filled() # Returns filled for all chromosomes
# Returns filled for chromosome chrM
chrM_filled = hg19.filled(chromosomes=["chrM"])
Removing genome’s cache
hg19.delete()
Utilities
Retrieving a list of the available genomes
You can get a complete list of the genomes available from the UCSC website with the following method:
from ucsc_genomes_downloader import get_available_genomes
all_genomes = get_available_genomes()
Tasselizing bed files
Create a tasselization of a given size of a given bed-like pandas dataframe.
Available alignment are to the left, right or center.
from ucsc_genomes_downloader.utils import tasselize_bed
import pandas as pd
my_bed = pd.read_csv("path/to/my/file.bed", sep="\t")
tasselized = tasselize_bed(
my_bed,
window_size=200,
alignment="left"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for ucsc_genomes_downloader-1.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 807e308697ce39001483b4326cc33bc7998aeeb6890b9797f983e40535c8389b |
|
MD5 | d17f9b2ca45353bc90476c213240b585 |
|
BLAKE2b-256 | 95aa3b2de2e9c2d1016a31e216030f41d162aa1497170ebf60c21b5ec05737ee |