Python package to quickly download genomes from the UCSC.
Project description
Python package to quickly download and work with genomes from the UCSC.
How do I install this package?
As usual, just download it using pip:
pip install ucsc_genomes_downloader
Tests Coverage
Since some software handling coverages sometime get slightly different results, here’s three of them:
Usage examples
Simply instanziate a new genome
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")
Downloading selected chromosomes
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19", chromosomes=["chr1", "chr2"])
Getting gaps regions
The method return a DataFrame in bed-like format that contains the regions where only n or N nucleotides are present.
all_gaps = hg19.gaps() # Returns gaps (region formed of Ns) for all chromosomes
# Returns gaps for chromosome chrM
chrM_gaps = hg19.gaps(chromosomes=["chrM"])
Getting filled regions
The method return a DataFrame in bed-like format that contains the regions where no unknown nucleotide is present, basically the complementary of the gaps method.
all_filled = hg19.filled() # Returns filled for all chromosomes
# Returns filled for chromosome chrM
chrM_filled = hg19.filled(chromosomes=["chrM"])
Removing genome’s cache
hg19.delete()
Utilities
Retrieving a list of the available genomes
You can get a complete list of the genomes available from the UCSC website with the following method:
from ucsc_genomes_downloader.utils import get_available_genomes
all_genomes = get_available_genomes()
Tasselizing bed files
Create a tasselization of a given size of a given bed-like pandas dataframe.
Available alignment are to the left, right or center.
from ucsc_genomes_downloader.utils import tasselize_bed
import pandas as pd
my_bed = pd.read_csv("path/to/my/file.bed", sep="\t")
tasselized = tasselize_bed(
my_bed,
window_size=200,
alignment="left"
)
Expand bed files regions
Expand a given dataframe in bed-like format using selected alignment.
Available alignment are to the left, right or center.
from ucsc_genomes_downloader.utils import expand_bed_regions
import pandas as pd
my_bed = pd.read_csv("path/to/my/file.bed", sep="\t")
expanded = expand_bed_regions(
my_bed,
window_size=1000,
alignment="left"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ucsc_genomes_downloader-1.1.5.tar.gz
.
File metadata
- Download URL: ucsc_genomes_downloader-1.1.5.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4fc198fa5fe8067238c75c441db348b25ef60079c276b91fe25fe6dfb6bdbdb |
|
MD5 | fed3e02c04e54142b2f03b158cb42e28 |
|
BLAKE2b-256 | a9b32b88a5bdc11ec820e7e42e02b3401ba74c10bc6b8f33433480e9d73bc9fc |