PROFILE methodology for the binarisation and normalisation of RNA-seq data

These details have not been verified by PyPI

Project links

Homepage

Project description

profile_binr

The PROFILE methodology for the binarisation and normalisation of RNA-seq data.

This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.

This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.

Installation

Using conda

The tool can be installed using the Conda package profile_binr in the colomoto channel. Note that some of its dependencies requires the conda-forge channel.

conda install -c conda-forge colomoto::profile_binr

Using pip

Requirements

R (≥4.0)
R packages:
- mclust
- diptest
- moments
- magrittr
- tidyr
- dplyr
- tibble
- bigmemory
- doSNOW
- foreach
- glue

pip install profile_binr

Usage

Once again this is a minimal example :

from profile_binr import ProfileBin
import pandas as pd

# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()

	Clec1b	Kdm3a	Coro2b	8430408G22Rik	Clec9a	Phf6	Usp14	Tmem167b
cell_id
HSPC_025	0.0	4.891604	1.426148	0.0	0.0	2.599758	2.954035	6.357369
HSPC_031	0.0	6.877725	0.000000	0.0	0.0	2.423483	1.804914	0.000000
HSPC_037	0.0	0.000000	6.913384	0.0	0.0	2.051659	8.265465	0.000000
LT-HSC_001	0.0	0.000000	8.178374	0.0	0.0	6.419817	3.453502	2.579528
HSPC_001	0.0	0.000000	9.475577	0.0	0.0	7.733370	1.478900	0.000000

# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)

# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the 
# number of workers with an integer
probin.fit(8) # train using 8 threads

# Look at the computed criteria
probin.criteria.head(8)

	Dip	BI	Kurtosis	DropOutRate	MeanNZ	DenPeak	Amplitude	Category
Clec1b	0.358107	1.635698	54.017736	0.876208	1.520978	-0.007249	8.852181	ZeroInf
Kdm3a	0.000000	2.407548	-0.784019	0.326087	3.847940	0.209239	10.126676	Bimodal
Coro2b	0.000000	2.320060	7.061604	0.658213	2.383819	0.004597	9.475577	ZeroInf
8430408G22Rik	0.684454	3.121069	21.729044	0.884058	2.983472	0.005663	9.067857	ZeroInf
Clec9a	1.000000	2.081717	140.089285	0.965580	2.280293	-0.009361	9.614233	Discarded
Phf6	0.000000	1.988667	-1.389024	0.035628	5.025501	2.017547	10.135226	Bimodal
Usp14	0.000000	2.208080	-1.224987	0.007850	6.109964	8.245570	11.088750	Bimodal
Tmem167b	0.000000	2.430813	0.093023	0.393720	3.448331	0.072982	9.486826	Bimodal

# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()

	Clec1b	Kdm3a	Coro2b	8430408G22Rik	Clec9a	Phf6	Usp14	Tmem167b
HSPC_025	NaN	1.0	NaN	NaN	NaN	0.0	0.0	1.0
HSPC_031	NaN	1.0	NaN	NaN	NaN	0.0	0.0	0.0
HSPC_037	NaN	0.0	1.0	NaN	NaN	0.0	1.0	0.0
LT-HSC_001	NaN	0.0	1.0	NaN	NaN	1.0	0.0	0.0
HSPC_001	NaN	0.0	1.0	NaN	NaN	1.0	0.0	0.0

# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()

	Kdm3a	Coro2b	Clec9a	Phf6	Usp14	Tmem167b
HSPC_025	9.786196e-01	0.184102	NaN	0.000801	8.318176e-05	9.999970e-01
HSPC_031	9.999981e-01	0.000000	NaN	0.000462	8.084114e-07	6.874397e-11
HSPC_037	4.408417e-09	0.892449	NaN	0.000145	9.999940e-01	6.874397e-11
LT-HSC_001	4.408417e-09	1.000000	NaN	0.991865	6.230178e-04	1.599753e-04
HSPC_001	4.408417e-09	1.000000	NaN	0.999865	2.171153e-07	6.874397e-11

References

Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

May 27, 2021

This version

0.1.1

May 27, 2021

0.1.0

May 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profile_binr-0.1.1.tar.gz (11.9 kB view details)

Uploaded May 27, 2021 Source

Built Distribution

profile_binr-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded May 27, 2021 Python 3

File details

Details for the file profile_binr-0.1.1.tar.gz.

File metadata

Download URL: profile_binr-0.1.1.tar.gz
Upload date: May 27, 2021
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b3c3e7cbf4cb20d8603e27870caf2ea524becda019a0e4c8f9a1a555efaf777b`
MD5	`4962bb486c7f75eb1f31cd116cfd52da`
BLAKE2b-256	`1135b97b4266cdd17f26b32842b452d42a3e2b91ff1c83b4e93416cdf5bfc395`

See more details on using hashes here.

File details

Details for the file profile_binr-0.1.1-py3-none-any.whl.

File metadata

Download URL: profile_binr-0.1.1-py3-none-any.whl
Upload date: May 27, 2021
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c483681231aae8ef3e32de12b321e7669cb7b510c16425a0a2ab414bed7c533c`
MD5	`70bc188249b67986a1974b201bd7d2d0`
BLAKE2b-256	`b7a49c1dcf1d9178bdae02cdf6e95a568a03368d740829478cf4948a8867764e`