Compositional Perturbation Autoencoder (CPA)
Project description
CPA - Compositional Perturbation Autoencoder
What is CPA?
CPA
is a framework to learn the effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug responses across different cell types, doses, and combinations. CPA allows:
- Out-of-distribution predictions of unseen drug and gene combinations at various doses and among different cell types.
- Learn interpretable drug and cell-type latent spaces.
- Estimate the dose-response curve for each perturbation and their combinations.
- Transfer pertubration effects from on cell-type to an unseen cell-type.
- Enable batch effect removal on a latent space and also gene expression space.
Installation
Installing CPA
You can install CPA using pip and also directly from the github to access latest development version. See detailed instructions here.
How to use CPA
Several tutorials are available here to get you started with CPA. The following table contains the list of tutorials:
How to optmize CPA hyperparamters for your data
We provide a tutorial on how to optimize CPA hyperparameters for your data.
Datasets and Pre-trained models
Datasets and pre-trained models are available here.
Recepie for Pre-processing a custom scRNAseq perturbation dataset
If you have access to you raw data, you can do the following steps to pre-process your dataset. A raw dataset should be a scanpy object containing raw counts and available required metadata (i.e. perturbation, dosage, etc.).
Pre-processing steps
-
Check for required information in cell metadata: a) Perturbation information should be in
adata.obs
. b) Dosage information should be inadata.obs
. In cases like CRISPR gene knockouts, disease states, time perturbations, etc, you can create & add a dummy dosage in youradata.obs
. For example:adata.obs['dosage'] = adata.obs['perturbation'].astype(str).apply(lambda x: '+'.join(['1.0' for _ in x.split('+')])).values
c) [If available] Cell type information should be in
adata.obs
. d) [Multi-batch integration] Batch information should be inadata.obs
. -
Filter out cells with low number of counts (
sc.pp.filter_cells
). For example:sc.pp.filter_cells(adata, min_counts=100)
[optional]
sc.pp.filter_genes(adata, min_counts=5)
-
Save the raw counts in
adata.layers['counts']
.adata.layers['counts'] = adata.X.copy()
-
Normalize the counts (
sc.pp.normalize_total
).sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed=True)
-
Log transform the normalized counts (
sc.pp.log1p
).sc.pp.log1p(adata)
-
Highly variable genes selection: There are two options: 1. Use the
sc.pp.highly_variable_genes
function to select highly variable genes.python sc.pp.highly_variable_genes(adata, n_top_genes=5000, subset=True)
2. (Highly Recommended specially for Multi-batch integration scenarios) Use scIB's highly variable genes selection function to select highly variable genes. This function is more robust to batch effects and can be used to select highly variable genes across multiple datasets.python import scIB adata_hvg = scIB.pp.hvg_batch(adata, batch_key='batch', n_top_genes=5000, copy=True)
Congrats! Now you're dataset is ready to be used with CPA. Don't forget to save your pre-processed dataset using adata.write_h5ad
function.
Support and contribute
If you have a question or new architecture or a model that could be integrated into our pipeline, you can post an issue
Reference
If CPA is helpful in your research, please consider citing the Lotfollahi et al. 2023
@article{lotfollahi2023predicting,
title={Predicting cellular responses to complex perturbations in high-throughput screens},
author={Lotfollahi, Mohammad and Klimovskaia Susmelj, Anna and De Donno, Carlo and Hetzel, Leon and Ji, Yuge and Ibarra, Ignacio L and Srivatsan, Sanjay R and Naghipourfar, Mohsen and Daza, Riza M and
Martin, Beth and others},
journal={Molecular Systems Biology},
pages={e11517},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cpa_tools-0.8.4.tar.gz
.
File metadata
- Download URL: cpa_tools-0.8.4.tar.gz
- Upload date:
- Size: 45.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.18 Linux/6.2.0-35-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbc6cc27b833fa1273f9703b710f7abb9b241818f4f09298510216ba41f30d5b |
|
MD5 | 4f7e60a2ecc8b44383549f2345fe14de |
|
BLAKE2b-256 | 08a05964e3811fb76cf54f4067f8d9544a5a17ae6941f4155ba1ff8f5f52d6f7 |
File details
Details for the file cpa_tools-0.8.4-py3-none-any.whl
.
File metadata
- Download URL: cpa_tools-0.8.4-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.18 Linux/6.2.0-35-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ab3ec22eb7a993f7f2c849789d422b1f37643b768bbad42807b834332720b22 |
|
MD5 | 64db27c5030db2cc3c06503de211bb0a |
|
BLAKE2b-256 | 3b89b7ec9d8d972a7924a6e9bdafa5930eb3fd6d4bdb919371ccdc04c908db56 |