Skip to main content

Pandas utilities for tab-delimited and other genomic files

Project description

Bioframe: Operations on Genomic Interval Dataframes

Python package DOI

Bioframe is a library to enable flexible and scalable operations on genomic interval dataframes in python. Building bioframe directly on top of pandas enables immediate access to a rich set of dataframe operations. Working in python enables rapid visualization (e.g. matplotlib, seaborn) and iteration of genomic analyses.

The philosophy underlying bioframe is to enable flexible operations: instead of creating a function for every possible use-case, we instead encourage users to compose functions to achieve their goals. As a rough rule of thumb, if a function requires three steps and is crucial for genomic interval arithmetic we have included it; conversely if it can be performed in a single line by composing two of the core functions, we have not included it.

Core functions

  • closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
  • cluster: Group overlapping intervals in a dataframe into clusters.
  • complement: Find genomic intervals that are not covered by any interval from a dataframe.
  • overlap: Find pairs of overlapping genomic intervals between two dataframes.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge,
select, and subtract.

Bioframe also has functions for loading diverse genomic data formats, and performing operations on special classes of genomic intervals, including chromosome arms and fixed size bins.

Read the docs and explore the jupyter notebooks

Genomic interval operations

To overlap two dataframes, call:

import bioframe as bf

bf.overlap(df1, df2)

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call:

import bioframe as bf

bf.merge(df1)

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See this jupyter notebook for visualizations of other core bioframe functions.

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Requirements

The following are required before installing bioframe:

  • Python 3.6+
  • numpy
  • pandas>=1.0.3

Installation

pip install bioframe

Projects currently using bioframe:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioframe-0.1.0.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

bioframe-0.1.0-py2.py3-none-any.whl (41.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file bioframe-0.1.0.tar.gz.

File metadata

  • Download URL: bioframe-0.1.0.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for bioframe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d833bbbce0af70fdc4c59d76339d2a5ebe27e9a987bc0175948e5f1b5d9f4d1
MD5 b309b42b392087fc89913f62e8657ed2
BLAKE2b-256 c37d9ea056b14d2a898769c925840bdcdaf174c1024ec2cf8d9a27fb5b054da9

See more details on using hashes here.

File details

Details for the file bioframe-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: bioframe-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 41.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for bioframe-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 495f1b6be76920c7cddc9c4aef84896880c11f9f5cd561bdeb21584c861f043f
MD5 f3a19efdb0bd3d36481455a0b84e1a37
BLAKE2b-256 a3e9d1133baa9e0cd5974ee2e7d4683dbee373b8cafd323e30667076281801a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page