Skip to main content

Compositional Perturbation Autoencoder (CPA)

Project description

CPA - Compositional Perturbation Autoencoder PyPI version Documentation Status Downloads

What is CPA?

Alt text

CPA is a framework to learn the effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug responses across different cell types, doses, and combinations. CPA allows:

  • Out-of-distribution predictions of unseen drug and gene combinations at various doses and among different cell types.
  • Learn interpretable drug and cell-type latent spaces.
  • Estimate the dose-response curve for each perturbation and their combinations.
  • Transfer pertubration effects from on cell-type to an unseen cell-type.

Installation

Installing CPA

You can install CPA using pip:

pip install cpa-tools

See detailed instructions here.

How to use CPA

Several tutorials are available here to get you started with CPA. The following table contains the list of tutorials:

Dataset Year Description Link
Lotollahi et al. 2023 Predicting combinatorial drug perturbations Open In Colab - Open In Documentation
Lotollahi et al. 2023 Predicting unseen perturbations uisng external embeddings enabling the model to predict unseen reponses to unseen drugs Open In Colab - Open In Documentation
Norman et al. 2019 Predicting combinatorial CRISPR perturbations Open In Colab - Open In Documentation
Kang et al. 2018 Context transfer (i.e. predict the effect of a perturbation (e.g. disease) on unseen cell types or transfer perturbation effects from one context to another) demo on IFN-β scRNA perturbation dataset Open In Colab - Open In Documentation

How to optmize CPA hyperparamters for your data

We provide a tutorial on how to optimize CPA hyperparameters for your data.

Datasets and Pre-trained models

Datasets and pre-trained models are available here.

Recepie for Pre-processing a custom scRNAseq perturbation dataset

If you have access to you raw data, you can do the following steps to pre-process your dataset. A raw dataset should be a scanpy object containing raw counts and available required metadata (i.e. perturbation, dosage, etc.).

Pre-processing steps

  1. Check for required information in cell metadata: a) Perturbation information should be in adata.obs. b) Dosage information should be in adata.obs. In cases like CRISPR gene knockouts, disease states, time perturbations, etc, you can create & add a dummy dosage in your adata.obs. For example:

        adata.obs['dosage'] = adata.obs['perturbation'].astype(str).apply(lambda x: '+'.join(['1.0' for _ in x.split('+')])).values
    

    c) [If available] Cell type information should be in adata.obs. d) [Multi-batch integration] Batch information should be in adata.obs.

  2. Filter out cells with low number of counts (sc.pp.filter_cells). For example:

    sc.pp.filter_cells(adata, min_counts=100)
    

    [optional]

    sc.pp.filter_genes(adata, min_counts=5)
    
  3. Save the raw counts in adata.layers['counts'].

    adata.layers['counts'] = adata.X.copy()
    
  4. Normalize the counts (sc.pp.normalize_total).

    sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed=True)
    
  5. Log transform the normalized counts (sc.pp.log1p).

    sc.pp.log1p(adata)
    
  6. Highly variable genes selection: There are two options: 1. Use the sc.pp.highly_variable_genes function to select highly variable genes. python sc.pp.highly_variable_genes(adata, n_top_genes=5000, subset=True) 2. (Highly Recommended specially for Multi-batch integration scenarios) Use scIB's highly variable genes selection function to select highly variable genes. This function is more robust to batch effects and can be used to select highly variable genes across multiple datasets. python import scIB adata_hvg = scIB.pp.hvg_batch(adata, batch_key='batch', n_top_genes=5000, copy=True)

Congrats! Now you're dataset is ready to be used with CPA. Don't forget to save your pre-processed dataset using adata.write_h5ad function.

Support and contribute

If you have a question or new architecture or a model that could be integrated into our pipeline, you can post an issue

Reference

If CPA is helpful in your research, please consider citing the Lotfollahi et al. 2023

@article{lotfollahi2023predicting,
    title={Predicting cellular responses to complex perturbations in high-throughput screens},
    author={Lotfollahi, Mohammad and Klimovskaia Susmelj, Anna and De Donno, Carlo and Hetzel, Leon and Ji, Yuge and Ibarra, Ignacio L and Srivatsan, Sanjay R and Naghipourfar, Mohsen and Daza, Riza M and 
    Martin, Beth and others},
    journal={Molecular Systems Biology},
    pages={e11517},
    year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpa_tools-0.8.1.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

cpa_tools-0.8.1-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file cpa_tools-0.8.1.tar.gz.

File metadata

  • Download URL: cpa_tools-0.8.1.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/5.4.0-162-generic

File hashes

Hashes for cpa_tools-0.8.1.tar.gz
Algorithm Hash digest
SHA256 b4a91d268c555d657b666cd8c98e52af7b4b19d252f791745bb873880a7d9961
MD5 49dbdc51cbab7376ad4ce4ee7a9012f5
BLAKE2b-256 86dcc97a481e8e7f6641c221d10e6d205ffe370af5cf449ee1bdc9d6b930a395

See more details on using hashes here.

File details

Details for the file cpa_tools-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: cpa_tools-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/5.4.0-162-generic

File hashes

Hashes for cpa_tools-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2190a00f2abdc168e26579002b9f16f70e42b8058465d455d33f57f7c5dbbd3d
MD5 8ae1b86e355b878dc4bee5ccdc046ccb
BLAKE2b-256 a7ae6e96dac467a71f4dbb611dda03bf188aac3d5ee177e0417d8167513d3284

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page