PROFILE methodology for the binarisation and normalisation of RNA-seq data
Project description
profile_binr
The PROFILE methodology for the binarisation and normalisation of RNA-seq data.
This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.
This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.
Installation
Using conda
The tool can be installed using the Conda package profile_binr in the colomoto
channel. Note that some of its dependencies requires the conda-forge
channel.
conda install -c conda-forge colomoto::profile_binr
Using pip
Requirements
- R (≥4.0)
- R packages:
- mclust
- diptest
- moments
- magrittr
- tidyr
- dplyr
- tibble
- bigmemory
- doSNOW
- foreach
- glue
pip install profile_binr
Usage
Once again this is a minimal example :
from profile_binr import ProfileBin
import pandas as pd
# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()
Clec1b | Kdm3a | Coro2b | 8430408G22Rik | Clec9a | Phf6 | Usp14 | Tmem167b | |
---|---|---|---|---|---|---|---|---|
cell_id | ||||||||
HSPC_025 | 0.0 | 4.891604 | 1.426148 | 0.0 | 0.0 | 2.599758 | 2.954035 | 6.357369 |
HSPC_031 | 0.0 | 6.877725 | 0.000000 | 0.0 | 0.0 | 2.423483 | 1.804914 | 0.000000 |
HSPC_037 | 0.0 | 0.000000 | 6.913384 | 0.0 | 0.0 | 2.051659 | 8.265465 | 0.000000 |
LT-HSC_001 | 0.0 | 0.000000 | 8.178374 | 0.0 | 0.0 | 6.419817 | 3.453502 | 2.579528 |
HSPC_001 | 0.0 | 0.000000 | 9.475577 | 0.0 | 0.0 | 7.733370 | 1.478900 | 0.000000 |
# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)
# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the
# number of workers with an integer
probin.fit(8) # train using 8 threads
# Look at the computed criteria
probin.criteria.head(8)
Dip | BI | Kurtosis | DropOutRate | MeanNZ | DenPeak | Amplitude | Category | |
---|---|---|---|---|---|---|---|---|
Clec1b | 0.358107 | 1.635698 | 54.017736 | 0.876208 | 1.520978 | -0.007249 | 8.852181 | ZeroInf |
Kdm3a | 0.000000 | 2.407548 | -0.784019 | 0.326087 | 3.847940 | 0.209239 | 10.126676 | Bimodal |
Coro2b | 0.000000 | 2.320060 | 7.061604 | 0.658213 | 2.383819 | 0.004597 | 9.475577 | ZeroInf |
8430408G22Rik | 0.684454 | 3.121069 | 21.729044 | 0.884058 | 2.983472 | 0.005663 | 9.067857 | ZeroInf |
Clec9a | 1.000000 | 2.081717 | 140.089285 | 0.965580 | 2.280293 | -0.009361 | 9.614233 | Discarded |
Phf6 | 0.000000 | 1.988667 | -1.389024 | 0.035628 | 5.025501 | 2.017547 | 10.135226 | Bimodal |
Usp14 | 0.000000 | 2.208080 | -1.224987 | 0.007850 | 6.109964 | 8.245570 | 11.088750 | Bimodal |
Tmem167b | 0.000000 | 2.430813 | 0.093023 | 0.393720 | 3.448331 | 0.072982 | 9.486826 | Bimodal |
# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()
Clec1b | Kdm3a | Coro2b | 8430408G22Rik | Clec9a | Phf6 | Usp14 | Tmem167b | |
---|---|---|---|---|---|---|---|---|
HSPC_025 | NaN | 1.0 | NaN | NaN | NaN | 0.0 | 0.0 | 1.0 |
HSPC_031 | NaN | 1.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 |
HSPC_037 | NaN | 0.0 | 1.0 | NaN | NaN | 0.0 | 1.0 | 0.0 |
LT-HSC_001 | NaN | 0.0 | 1.0 | NaN | NaN | 1.0 | 0.0 | 0.0 |
HSPC_001 | NaN | 0.0 | 1.0 | NaN | NaN | 1.0 | 0.0 | 0.0 |
# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()
Clec1b | Kdm3a | Coro2b | 8430408G22Rik | Clec9a | Phf6 | Usp14 | Tmem167b | |
---|---|---|---|---|---|---|---|---|
HSPC_025 | 0.0 | 9.786196e-01 | 0.184102 | 0.0 | NaN | 0.000801 | 8.318176e-05 | 9.999970e-01 |
HSPC_031 | 0.0 | 9.999981e-01 | 0.000000 | 0.0 | NaN | 0.000462 | 8.084114e-07 | 6.874397e-11 |
HSPC_037 | 0.0 | 4.408417e-09 | 0.892449 | 0.0 | NaN | 0.000145 | 9.999940e-01 | 6.874397e-11 |
LT-HSC_001 | 0.0 | 4.408417e-09 | 1.000000 | 0.0 | NaN | 0.991865 | 6.230178e-04 | 1.599753e-04 |
HSPC_001 | 0.0 | 4.408417e-09 | 1.000000 | 0.0 | NaN | 0.999865 | 2.171153e-07 | 6.874397e-11 |
References
- Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file profile_binr-0.1.1.tar.gz
.
File metadata
- Download URL: profile_binr-0.1.1.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3c3e7cbf4cb20d8603e27870caf2ea524becda019a0e4c8f9a1a555efaf777b |
|
MD5 | 4962bb486c7f75eb1f31cd116cfd52da |
|
BLAKE2b-256 | 1135b97b4266cdd17f26b32842b452d42a3e2b91ff1c83b4e93416cdf5bfc395 |
File details
Details for the file profile_binr-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: profile_binr-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c483681231aae8ef3e32de12b321e7669cb7b510c16425a0a2ab414bed7c533c |
|
MD5 | 70bc188249b67986a1974b201bd7d2d0 |
|
BLAKE2b-256 | b7a49c1dcf1d9178bdae02cdf6e95a568a03368d740829478cf4948a8867764e |