An image feature extractor with self-supervised learning
Project description
cytoself
Self-supervised deep learning encodes high-resolution features of protein subcellular localization
cytoself is a self-supervised model that we developed for learning features of protein subcellular localization from microscopy images. This model is described in detail in our paper [1]. The image representations derived from cytoself encapsulate highly specific features that can derive functional insights for proteins on the sole basis of their localization.
Applying cytoself to images of endogenously labeled proteins from the recently released OpenCell database creates a highly resolved protein localization atlas [2].
[1] Kobayashi, Hirofumi, et al. "Self-Supervised Deep-Learning Encodes High-Resolution Features of Protein
Subcellular Localization." Nature Methods (2022).
https://www.nature.com/articles/s41592-022-01541-z
[2] Cho, Nathan H., et al. "OpenCell: Endogenous tagging for the cartography of human cellular organization."
Science 375.6585 (2022): eabi6983.
https://www.science.org/doi/10.1126/science.abi6983
How cytoself works
cytoself uses images and an associated identity information (ID) as a label to learn the localization patterns of proteins. When applied to OpenCell we used cell images where individual proteins are endogenously tagged per image. For each image we know which protein is tagged and that is the ID used. Our model implicitely learns to ignore image differences for images that are associated to the same ID, and tries its best to tell images apart if they are associated to different IDs. In practice cytoself can resolve very fine textural differences between image classes but also can ignore very complex sources of image variability such as cell shapes, states, etc...
What's in this repository
This repository offers three main components:
DataManager
,
cytoself.models
,
and
Analytics
.
DataManager
is a simple module to handle train, validate and test data.
You may want to modify it to adapt to your own data structure.
This module is in
cytoself.data_loader.data_manager
.
cytoself.models
contains modules for three different variants of the cytoself model:
a model without split-quantization, a model without the pretext task, and the 'full' model (refer to our preprint for details about these variants).
There is a submodule for each model variant that provides methods for constructing, compiling, and training the models (which are built using tensorflow).
Analytics
is a simple module to perform analytic processes such as dimension reduction and plotting.
You may want to modify it too to perform your own analysis. This module is in
cytoself.analysis.analytics
.
Pre-trained model weights are included in the example script.
Note: Cytoself will migrate to pytorch implementation in the near future.
Installation
Recommended: create a new environment and install cytoself on the environment from pypi
conda create -y -n cytoself python=3.7
conda activate cytoself
pip install cytoself
(Option) Install TensorFlow GPU
If your computer is equipped with GPUs that support Tensorflow 1.15, you can install Tensorflow-gpu to utilize GPUs. Install the following packages before cytoself, or uninstall the existing CPU versions and reinstall the GPU versions again with conda.
conda install -y h5py=2.10.0 tensorflow-gpu=1.15
For developers
You can also install cytoself from this GitHub repository.
git clone https://github.com/royerlab/cytoself.git
pip install .
Troubleshooting
In case of getting errors in the installation, run the following code inside the cytoself folder to manually install the dependencies.
pip install -r requirements.txt
As a reference for a complete dependency, a snapshot of a working environment can be found in
environment.yml
Example script (How to use cytoself)
A minimal example script is in
example/simple_training.py
.
Learn how to use cytoself through
Test if this package runs in your computer with command
python examples/simple_example.py
Computational resources
It is highly recommended to use a GPU to run cytoself. For example, a full model with image shape of (100, 100, 2) and batch size 64 can take ~9GB of GPU memory.
Tested Environment
Google Colab (CPU/GPU/TPU)
macOS 10.14.6, RAM 32GB (CPU)
Windows10 Pro 64bit, RTX 1080Ti, CUDA 11.6 (CPU/GPU)
Ubuntu 18.04.6 LTS, RTX 2080Ti, CUDA 11.2 (CPU/GPU)
Data Availability
Pretrained model
Pre-trained models used in the paper. Please follow the example script or to lean how to use a pre-trained model.
model_protein_nucleardistance.h5
: The model trained on target protein and nuclear distance.
model_protein.h5
: The model trained on target protein alone.
model_protein_nucleus.h5
: The model trained on target protein and nucleus.
The full data of image and protein label used in this work can be found here.
The image data have the shape of [batch, 100, 100, 4]
, in which the last channel dimension corresponds to [target protein, nucleus, nuclear distance, nuclear segmentation]
.
Embeddings
The embedding vectors of global representations and their labels are available from the following links. Due to their large size, only embeddings extracted from test data are provided.
Global_representation.npy In the shape of 114,806 images x 9,216 latent dimensions. (3.9 GB)
label.csv 114,806 rows x 7 columns. (6.2 MB)
Image and label data
Due to the large size, the whole data is split to 10 files. The files are intended to be concatenated together to form one large numpy file or one large csv.
Image_data00.npy
Image_data01.npy
Image_data02.npy
Image_data03.npy
Image_data04.npy
Image_data05.npy
Image_data06.npy
Image_data07.npy
Image_data08.npy
Image_data09.npy
Label_data00.csv
Label_data01.csv
Label_data02.csv
Label_data03.csv
Label_data04.csv
Label_data05.csv
Label_data06.csv
Label_data07.csv
Label_data08.csv
Label_data09.csv
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cytoself-0.0.1.4.tar.gz
.
File metadata
- Download URL: cytoself-0.0.1.4.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046d892821d7b2e66d07546f30dc6d2b2ff8918b3271f2374e76cec40b6a84fa |
|
MD5 | dd8007250699343129e455d21e7d212a |
|
BLAKE2b-256 | 463383e92b7277081bae443add8b0b2da9f73aa05cb1017f04aba843ebc262b7 |
File details
Details for the file cytoself-0.0.1.4-py3-none-any.whl
.
File metadata
- Download URL: cytoself-0.0.1.4-py3-none-any.whl
- Upload date:
- Size: 60.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9849c38fa53e62c46f09c2fce6a859d3464c2410a2ff866570be4ec0e0ee817c |
|
MD5 | cfba582b9b4a0e5be6829caf4528c838 |
|
BLAKE2b-256 | c8dcc4bf8e63314c94cb8bd683c2bc7bd834b98ae43077fd280a6020609ff8f3 |