Skip to main content

PROFILE methodology for the binarisation and normalisation of RNA-seq data

Project description

profile_binr

The PROFILE methodology for the binarisation and normalisation of RNA-seq data.

This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.

This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.

Installation

Using conda

The tool can be installed using the Conda package profile_binr in the colomoto channel. Note that some of its dependencies requires the conda-forge channel.

conda install -c conda-forge colomoto::profile_binr

Using pip

Requirements

  • R (≥4.0)
  • R packages:
    • mclust
    • diptest
    • moments
    • magrittr
    • tidyr
    • dplyr
    • tibble
    • bigmemory
    • doSNOW
    • foreach
    • glue
pip install profile_binr

Usage

Once again this is a minimal example :

from profile_binr import ProfileBin
import pandas as pd

# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
cell_id
HSPC_025 0.0 4.891604 1.426148 0.0 0.0 2.599758 2.954035 6.357369
HSPC_031 0.0 6.877725 0.000000 0.0 0.0 2.423483 1.804914 0.000000
HSPC_037 0.0 0.000000 6.913384 0.0 0.0 2.051659 8.265465 0.000000
LT-HSC_001 0.0 0.000000 8.178374 0.0 0.0 6.419817 3.453502 2.579528
HSPC_001 0.0 0.000000 9.475577 0.0 0.0 7.733370 1.478900 0.000000
# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)

# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the 
# number of workers with an integer
probin.fit(8) # train using 8 threads

# Look at the computed criteria
probin.criteria.head(8)
Dip BI Kurtosis DropOutRate MeanNZ DenPeak Amplitude Category
Clec1b 0.358107 1.635698 54.017736 0.876208 1.520978 -0.007249 8.852181 ZeroInf
Kdm3a 0.000000 2.407548 -0.784019 0.326087 3.847940 0.209239 10.126676 Bimodal
Coro2b 0.000000 2.320060 7.061604 0.658213 2.383819 0.004597 9.475577 ZeroInf
8430408G22Rik 0.684454 3.121069 21.729044 0.884058 2.983472 0.005663 9.067857 ZeroInf
Clec9a 1.000000 2.081717 140.089285 0.965580 2.280293 -0.009361 9.614233 Discarded
Phf6 0.000000 1.988667 -1.389024 0.035628 5.025501 2.017547 10.135226 Bimodal
Usp14 0.000000 2.208080 -1.224987 0.007850 6.109964 8.245570 11.088750 Bimodal
Tmem167b 0.000000 2.430813 0.093023 0.393720 3.448331 0.072982 9.486826 Bimodal
# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 NaN 1.0 NaN NaN NaN 0.0 0.0 1.0
HSPC_031 NaN 1.0 NaN NaN NaN 0.0 0.0 0.0
HSPC_037 NaN 0.0 1.0 NaN NaN 0.0 1.0 0.0
LT-HSC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
HSPC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 0.0 9.786196e-01 0.184102 0.0 NaN 0.000801 8.318176e-05 9.999970e-01
HSPC_031 0.0 9.999981e-01 0.000000 0.0 NaN 0.000462 8.084114e-07 6.874397e-11
HSPC_037 0.0 4.408417e-09 0.892449 0.0 NaN 0.000145 9.999940e-01 6.874397e-11
LT-HSC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.991865 6.230178e-04 1.599753e-04
HSPC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.999865 2.171153e-07 6.874397e-11

References

  • Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profile_binr-0.1.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

profile_binr-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file profile_binr-0.1.1.tar.gz.

File metadata

  • Download URL: profile_binr-0.1.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b3c3e7cbf4cb20d8603e27870caf2ea524becda019a0e4c8f9a1a555efaf777b
MD5 4962bb486c7f75eb1f31cd116cfd52da
BLAKE2b-256 1135b97b4266cdd17f26b32842b452d42a3e2b91ff1c83b4e93416cdf5bfc395

See more details on using hashes here.

File details

Details for the file profile_binr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: profile_binr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c483681231aae8ef3e32de12b321e7669cb7b510c16425a0a2ab414bed7c533c
MD5 70bc188249b67986a1974b201bd7d2d0
BLAKE2b-256 b7a49c1dcf1d9178bdae02cdf6e95a568a03368d740829478cf4948a8867764e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page