Skip to main content

PROFILE methodology for the binarisation and normalisation of RNA-seq data

Project description

profile_binr

The PROFILE methodology for the binarisation and normalisation of RNA-seq data.

This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.

This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.

Installation

Using conda

The tool can be installed using the Conda package profile_binr in the colomoto channel. Note that some of its dependencies requires the conda-forge channel.

conda install -c conda-forge colomoto::profile_binr

Using pip

Requirements

  • R (≥4.0)
  • R packages:
    • mclust
    • diptest
    • moments
    • magrittr
    • tidyr
    • dplyr
    • tibble
    • bigmemory
    • doSNOW
    • foreach
    • glue
pip install profile_binr

Usage

Once again this is a minimal example :

from profile_binr import ProfileBin
import pandas as pd

# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
cell_id
HSPC_025 0.0 4.891604 1.426148 0.0 0.0 2.599758 2.954035 6.357369
HSPC_031 0.0 6.877725 0.000000 0.0 0.0 2.423483 1.804914 0.000000
HSPC_037 0.0 0.000000 6.913384 0.0 0.0 2.051659 8.265465 0.000000
LT-HSC_001 0.0 0.000000 8.178374 0.0 0.0 6.419817 3.453502 2.579528
HSPC_001 0.0 0.000000 9.475577 0.0 0.0 7.733370 1.478900 0.000000
# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)

# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the 
# number of workers with an integer
probin.fit(8) # train using 8 threads

# Look at the computed criteria
probin.criteria.head(8)
Dip BI Kurtosis DropOutRate MeanNZ DenPeak Amplitude Category
Clec1b 0.358107 1.635698 54.017736 0.876208 1.520978 -0.007249 8.852181 ZeroInf
Kdm3a 0.000000 2.407548 -0.784019 0.326087 3.847940 0.209239 10.126676 Bimodal
Coro2b 0.000000 2.320060 7.061604 0.658213 2.383819 0.004597 9.475577 ZeroInf
8430408G22Rik 0.684454 3.121069 21.729044 0.884058 2.983472 0.005663 9.067857 ZeroInf
Clec9a 1.000000 2.081717 140.089285 0.965580 2.280293 -0.009361 9.614233 Discarded
Phf6 0.000000 1.988667 -1.389024 0.035628 5.025501 2.017547 10.135226 Bimodal
Usp14 0.000000 2.208080 -1.224987 0.007850 6.109964 8.245570 11.088750 Bimodal
Tmem167b 0.000000 2.430813 0.093023 0.393720 3.448331 0.072982 9.486826 Bimodal
# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 NaN 1.0 NaN NaN NaN 0.0 0.0 1.0
HSPC_031 NaN 1.0 NaN NaN NaN 0.0 0.0 0.0
HSPC_037 NaN 0.0 1.0 NaN NaN 0.0 1.0 0.0
LT-HSC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
HSPC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 0.0 9.786196e-01 0.184102 0.0 NaN 0.000801 8.318176e-05 9.999970e-01
HSPC_031 0.0 9.999981e-01 0.000000 0.0 NaN 0.000462 8.084114e-07 6.874397e-11
HSPC_037 0.0 4.408417e-09 0.892449 0.0 NaN 0.000145 9.999940e-01 6.874397e-11
LT-HSC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.991865 6.230178e-04 1.599753e-04
HSPC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.999865 2.171153e-07 6.874397e-11

References

  • Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profile_binr-0.1.2.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

profile_binr-0.1.2-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file profile_binr-0.1.2.tar.gz.

File metadata

  • Download URL: profile_binr-0.1.2.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5b67af07393b015ee2849feab1ac386fc933dc653774350147e688322309b807
MD5 ed58f0a0f4dda67e0945de7249698531
BLAKE2b-256 5c0861441043c7322614be28f57e2fb49d16f8d423fb9302132e81338e3c93a2

See more details on using hashes here.

File details

Details for the file profile_binr-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: profile_binr-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for profile_binr-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4db566efbfeae710a77ba5daa8d7dfe81af173c66f64e17fee24deaf3cc38953
MD5 a92a54567f81d648bb7a58169d20d901
BLAKE2b-256 a06c2ae65d88bbeead5995f56c9582d2351e40782a66e59642d01feb365e7a18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page