Sparse binary format for Hi-C genomic contact heatmaps
Project description
# Cooler
[![Build Status](https://travis-ci.org/mirnylab/cooler.svg?branch=master)](https://travis-ci.org/mirnylab/cooler)
[![Documentation Status](https://readthedocs.org/projects/cooler/badge/?version=latest)](http://cooler.readthedocs.org/en/latest/)
## A cool place to store your Hi-C
Cooler is a **sparse, compressed, binary** persistent storage format for Hi-C contact maps based on [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).
- Documentation is available [here](http://cooler.readthedocs.org/en/latest/).
- See example [Jupyter notebook](https://github.com/mirnylab/cooler-binder/blob/master/cooler_quickstart.ipynb) or [try it live](http://mybinder.org/repo/mirnylab/cooler-binder).
- Some published data sets are available at `ftp://cooler.csail.mit.edu/coolers`.
As published Hi-C datasets increase in sequencing depth and resolution, a simple sparse representation lends itself better not only to storage but also to streaming and [out-of-core](https://en.wikipedia.org/wiki/Out-of-core_algorithm) algorithms for analysis. The cooler [format](http://cooler.readthedocs.io/en/latest/intro.html#data-model) implements a simple schema and data model that stores a high resolution contact matrix in a sparse representation along with important auxiliary data such as scaffold information, genomic bin annotations, and basic metadata. Data tables are stored in a **columnar** representation as HDF5 Groups of 1D array datasets of equal length. The contact matrix itself is stored as a single table containing only the **nonzero upper triangle** pixels.
The `cooler` [library](https://github.com/mirnylab/cooler) provides a thin wrapper over the excellent [h5py](http://docs.h5py.org/en/latest/) Python interface to HDF5. It supports creation of cooler files and the following types of **range queries** on the data:
- Tabular selections are retrieved as Pandas DataFrames and Series.
- Matrix selections are retrieved as SciPy sparse matrices.
- Metadata is retrieved as a json-serializable Python dictionary.
- Range queries can be supplied using either integer bin indexes or genomic coordinate intervals.
```python
>>> import cooler
>>> import matplotlib.pyplot as plt
>>> c = cooler.Cooler('bigDataset.cool')
>>> resolution = c.info['bin-size']
>>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000')
>>> plt.matshow(np.log10(mat.toarray()), cmap='YlOrRd')
```
The `cooler` library also includes utilities for performing out-of-core contact **matrix balancing** on a cooler file of any resolution. See the [docs](http://cooler.readthedocs.org/en/latest/) for more information.
### Installation
Requirements:
- Python 2.7/3.3+
- libhdf5 and Python packages `numpy`, `scipy`, `pandas`, `h5py`. If you don't have them installed already, we recommend you use the [conda](http://conda.pydata.org/miniconda.html) package manager to manage these dependencies instead of pip.
Install from PyPI using pip.
```sh
$ pip install cooler
```
For the latest, unstable version, clone and install from master or install directly from the repo.
```sh
$ pip install git+git://github.com/mirnylab/cooler.git
```
For development, clone and install in "editable" (i.e. development) mode with the `-e` option. This way you can also pull changes on the fly.
```sh
$ git clone https://github.com/mirnylab/cooler.git
$ cd cooler
$ pip install -e .
```
### Contributing
[Pull requests](https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/) are welcome. The current requirements for testing are `nose` and `mock`.
[![Build Status](https://travis-ci.org/mirnylab/cooler.svg?branch=master)](https://travis-ci.org/mirnylab/cooler)
[![Documentation Status](https://readthedocs.org/projects/cooler/badge/?version=latest)](http://cooler.readthedocs.org/en/latest/)
## A cool place to store your Hi-C
Cooler is a **sparse, compressed, binary** persistent storage format for Hi-C contact maps based on [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format).
- Documentation is available [here](http://cooler.readthedocs.org/en/latest/).
- See example [Jupyter notebook](https://github.com/mirnylab/cooler-binder/blob/master/cooler_quickstart.ipynb) or [try it live](http://mybinder.org/repo/mirnylab/cooler-binder).
- Some published data sets are available at `ftp://cooler.csail.mit.edu/coolers`.
As published Hi-C datasets increase in sequencing depth and resolution, a simple sparse representation lends itself better not only to storage but also to streaming and [out-of-core](https://en.wikipedia.org/wiki/Out-of-core_algorithm) algorithms for analysis. The cooler [format](http://cooler.readthedocs.io/en/latest/intro.html#data-model) implements a simple schema and data model that stores a high resolution contact matrix in a sparse representation along with important auxiliary data such as scaffold information, genomic bin annotations, and basic metadata. Data tables are stored in a **columnar** representation as HDF5 Groups of 1D array datasets of equal length. The contact matrix itself is stored as a single table containing only the **nonzero upper triangle** pixels.
The `cooler` [library](https://github.com/mirnylab/cooler) provides a thin wrapper over the excellent [h5py](http://docs.h5py.org/en/latest/) Python interface to HDF5. It supports creation of cooler files and the following types of **range queries** on the data:
- Tabular selections are retrieved as Pandas DataFrames and Series.
- Matrix selections are retrieved as SciPy sparse matrices.
- Metadata is retrieved as a json-serializable Python dictionary.
- Range queries can be supplied using either integer bin indexes or genomic coordinate intervals.
```python
>>> import cooler
>>> import matplotlib.pyplot as plt
>>> c = cooler.Cooler('bigDataset.cool')
>>> resolution = c.info['bin-size']
>>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000')
>>> plt.matshow(np.log10(mat.toarray()), cmap='YlOrRd')
```
The `cooler` library also includes utilities for performing out-of-core contact **matrix balancing** on a cooler file of any resolution. See the [docs](http://cooler.readthedocs.org/en/latest/) for more information.
### Installation
Requirements:
- Python 2.7/3.3+
- libhdf5 and Python packages `numpy`, `scipy`, `pandas`, `h5py`. If you don't have them installed already, we recommend you use the [conda](http://conda.pydata.org/miniconda.html) package manager to manage these dependencies instead of pip.
Install from PyPI using pip.
```sh
$ pip install cooler
```
For the latest, unstable version, clone and install from master or install directly from the repo.
```sh
$ pip install git+git://github.com/mirnylab/cooler.git
```
For development, clone and install in "editable" (i.e. development) mode with the `-e` option. This way you can also pull changes on the fly.
```sh
$ git clone https://github.com/mirnylab/cooler.git
$ cd cooler
$ pip install -e .
```
### Contributing
[Pull requests](https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/) are welcome. The current requirements for testing are `nose` and `mock`.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cooler-0.4.0.tar.gz
(40.0 MB
view details)
Built Distribution
File details
Details for the file cooler-0.4.0.tar.gz
.
File metadata
- Download URL: cooler-0.4.0.tar.gz
- Upload date:
- Size: 40.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7012977fe5cf79fb1f43cd6ce31f1e75968106162b313640c1c9fee276535c63 |
|
MD5 | 069d987a447d2350a64d4c8dd688f1d7 |
|
BLAKE2b-256 | 29864c7b28ff52d81e80441fae8da3df90a9584d052b8f73e1eeeca3929afdcd |
File details
Details for the file cooler-0.4.0-py2.py3-none-any.whl
.
File metadata
- Download URL: cooler-0.4.0-py2.py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e1f14883cd115ae155763e8b15af67e76775508c5316f9c4a19f530ea66d3df |
|
MD5 | b0063e23e58f6b3f6465dc45bc67ed75 |
|
BLAKE2b-256 | 21bdd38410a39ab99b1e7f2daa35bf2669a01819f4420d83289662c55b7d04bb |