Skip to main content

Pandas utilities for tab-delimited and other genomic files

Project description

Bioframe: Operations on Genomic Interval Dataframes

Python package DOI Docs status

Bioframe is a library to enable flexible and scalable operations on genomic interval dataframes in python. Building bioframe directly on top of pandas enables immediate access to a rich set of dataframe operations. Working in python enables rapid visualization (e.g. matplotlib, seaborn) and iteration of genomic analyses.

The philosophy underlying bioframe is to enable flexible operations: instead of creating a function for every possible use-case, we instead encourage users to compose functions to achieve their goals.

Bioframe implements a variety of genomic interval operations directly on dataframes. Bioframe also includes functions for loading diverse genomic data formats, and performing operations on special classes of genomic intervals, including chromosome arms and fixed size bins.

Read the docs, including the guide.

If you use bioframe in your work, please cite via its zenodo DOI 10.5281/zenodo.5703622

Installation

The following are required before installing bioframe:

  • Python 3.7+
  • numpy
  • pandas>=1.3
pip install bioframe

Interval operations

Key genomic interval operations in bioframe include:

  • closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
  • cluster: Group overlapping intervals in a dataframe into clusters.
  • complement: Find genomic intervals that are not covered by any interval from a dataframe.
  • overlap: Find pairs of overlapping genomic intervals between two dataframes.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call:

import bioframe as bf

bf.overlap(df1, df2)

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call:

import bioframe as bf

bf.merge(df1)

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s read_csv/read_table but provides a schema argument to populate column names for common tabular file formats.

jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/hg38/tsv/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Projects currently using bioframe:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioframe-0.3.3.tar.gz (106.8 kB view details)

Uploaded Source

Built Distribution

bioframe-0.3.3-py2.py3-none-any.whl (112.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file bioframe-0.3.3.tar.gz.

File metadata

  • Download URL: bioframe-0.3.3.tar.gz
  • Upload date:
  • Size: 106.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for bioframe-0.3.3.tar.gz
Algorithm Hash digest
SHA256 de625f2cd7b52cc754730b3750d49730d238b847a22c475dddc92d11d6ea2c55
MD5 ba04a02d3e9c6cd0bbc2f731ebf3bcdc
BLAKE2b-256 a320c396de1744683a35ef17c7f0e8d9f452405d37a52245bcc50c2880002593

See more details on using hashes here.

File details

Details for the file bioframe-0.3.3-py2.py3-none-any.whl.

File metadata

  • Download URL: bioframe-0.3.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 112.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for bioframe-0.3.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fa0afade2cb731ef12a85ca45666ecc199babf60ec2bb7cb310e6d0c0d33ac1b
MD5 d353e368541de3c0d38a4c1eb0b5a45b
BLAKE2b-256 4969448a78631438ed6e44bf9402f7ab6423e886e96ef3f82692365c34e5927e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page