Skip to main content

Sparse binary format for Hi-C genomic contact heatmaps

Project description

# Cooler

[![Build Status](https://travis-ci.org/mirnylab/cooler.svg?branch=master)](https://travis-ci.org/mirnylab/cooler)
[![Documentation Status](https://readthedocs.org/projects/cooler/badge/?version=latest)](http://cooler.readthedocs.org/en/latest/)

## A cool place to store your Hi-C

Cooler is a **sparse, compressed, binary** persistent storage format for Hi-C contact maps based on [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).

- Documentation is available [here](http://cooler.readthedocs.org/en/latest/).
- See example [Jupyter notebook](https://github.com/mirnylab/cooler-binder/blob/master/cooler_quickstart.ipynb) or [try it live](http://mybinder.org/repo/mirnylab/cooler-binder).
- Some published data sets are available at `ftp://cooler.csail.mit.edu/coolers`.

As published Hi-C datasets increase in sequencing depth and resolution, a simple sparse representation lends itself better not only to storage but also to streaming and [out-of-core](https://en.wikipedia.org/wiki/Out-of-core_algorithm) algorithms for analysis. The cooler [format](http://cooler.readthedocs.io/en/latest/intro.html#data-model) implements a simple schema and data model that stores a high resolution contact matrix in a sparse representation along with important auxiliary data such as scaffold information, genomic bin annotations, and basic metadata. Data tables are stored in a **columnar** representation as HDF5 Groups of 1D array datasets of equal length. The contact matrix itself is stored as a single table containing only the **nonzero upper triangle** pixels.

The `cooler` [library](https://github.com/mirnylab/cooler) provides a thin wrapper over the excellent [h5py](http://docs.h5py.org/en/latest/) Python interface to HDF5. It supports creation of cooler files and the following types of **range queries** on the data:

- Tabular selections are retrieved as Pandas DataFrames and Series.
- Matrix selections are retrieved as SciPy sparse matrices.
- Metadata is retrieved as a json-serializable Python dictionary.
- Range queries can be supplied using either integer bin indexes or genomic coordinate intervals.


```python

>>> import cooler
>>> import matplotlib.pyplot as plt
>>> c = cooler.Cooler('bigDataset.cool')
>>> resolution = c.info['bin-size']
>>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000')
>>> plt.matshow(np.log10(mat.toarray()), cmap='YlOrRd')
```

The `cooler` library also includes utilities for performing out-of-core contact **matrix balancing** on a cooler file of any resolution. See the [docs](http://cooler.readthedocs.org/en/latest/) for more information.


### Installation

Requirements:

- Python 2.7/3.3+
- libhdf5 and Python packages `numpy`, `scipy`, `pandas`, `h5py`. If you don't have them installed already, we recommend you use the [conda](http://conda.pydata.org/miniconda.html) package manager to manage these dependencies instead of pip.

Install from PyPI using pip.
```sh
$ pip install cooler
```

For the latest, unstable version, clone and install from master or install directly from the repo.
```sh
$ pip install git+git://github.com/mirnylab/cooler.git
```

For development, clone and install in "editable" (i.e. development) mode with the `-e` option. This way you can also pull changes on the fly.
```sh
$ git clone https://github.com/mirnylab/cooler.git
$ cd cooler
$ pip install -e .
```

### Contributing

[Pull requests](https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/) are welcome. The current requirements for testing are `nose` and `mock`.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cooler-0.4.0.tar.gz (40.0 MB view details)

Uploaded Source

Built Distribution

cooler-0.4.0-py2.py3-none-any.whl (36.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cooler-0.4.0.tar.gz.

File metadata

  • Download URL: cooler-0.4.0.tar.gz
  • Upload date:
  • Size: 40.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cooler-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7012977fe5cf79fb1f43cd6ce31f1e75968106162b313640c1c9fee276535c63
MD5 069d987a447d2350a64d4c8dd688f1d7
BLAKE2b-256 29864c7b28ff52d81e80441fae8da3df90a9584d052b8f73e1eeeca3929afdcd

See more details on using hashes here.

File details

Details for the file cooler-0.4.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cooler-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2e1f14883cd115ae155763e8b15af67e76775508c5316f9c4a19f530ea66d3df
MD5 b0063e23e58f6b3f6465dc45bc67ed75
BLAKE2b-256 21bdd38410a39ab99b1e7f2daa35bf2669a01819f4420d83289662c55b7d04bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page