Skip to main content

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.

Project description



GitHub release PyPI Conda GitHub Workflow Status License Twitter

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.



Introduction

DeepLC is a retention time predictor for (modified) peptides that employs Deep Learning. It's strength lies in the fact that it can accurately predict retention times for modified peptides, even if hasn't seen said modification during training.

DeepLC can be run with a graphical user interface (GUI) or as a Python package. In the latter case, DeepLC can be used from the command line, or as a python module.

Graphical user interface

Installation

Download GUI

  1. Download deeplc_gui.zip from the latest release and unzip.
  2. Install DeepLC GUI with install_gui_windows.bat or install_gui_linux.sh, depending on your operating system.
  3. Run DeepLC GUI by running the deeplc_gui.jar.

Python package

Installation

install with bioconda install with pip container

Install with conda, using the bioconda and conda-forge channels:
conda install -c bioconda -c conda-forge deeplc

Or install with pip:
pip install deeplc

Command line interface

To use the DeepLC CLI, run:

deeplc --file_pred <path/to/peptide_file.csv>

We highly recommend to add a peptide file with known retention times for calibration:

deeplc --file_pred  <path/to/peptide_file.csv> --file_cal <path/to/peptide_file_with_tr.csv>

For an overview of all CLI arguments, run deeplc --help.

Python module

Minimal example:

import pandas as pd
from deeplc import DeepLC

peptide_file = "datasets/test_pred.csv"
calibration_file = "datasets/test_train.csv"

pep_df = pd.read_csv(peptide_file, sep=",")
pep_df['modifications'] = pep_df['modifications'].fillna("")

cal_df = pd.read_csv(calibration_file, sep=",")
cal_df['modifications'] = cal_df['modifications'].fillna("")

dlc = DeepLC()
dlc.calibrate_preds(seq_df=cal_df)
preds = dlc.make_preds(seq_df=pep_df)

For a more elaborate example, see examples/deeplc_example.py .

Input files

DeepLC expects comma-separated values (CSV) with the following columns:

  • seq: unmodified peptide sequences
  • modifications: MS2PIP-style formatted modifications: Every modification is listed as location|name, separated by a pipe (|) between the location, the name, and other modifications. location is an integer counted starting at 1 for the first AA. 0 is reserved for N-terminal modifications, -1 for C-terminal modifications. name has to correspond to a Unimod (PSI-MS) name.
  • tr: retention time (only required for calibration)

For example:

seq,modifications,tr
AAGPSLSHTSGGTQSK,,12.1645
AAINQKLIETGER,6|Acetyl,34.095
AANDAGYFNDEMAPIEVKTK,12|Oxidation|18|Acetyl,37.3765

See examples/datasets for more examples.

Prediction models

DeepLC comes with multiple CNN models trained on data from various experimental settings:

Model filename Experimental settings Publication
full_hc_dia_fixed_mods.hdf5 Reverse phase Rosenberger et al. 2014
full_hc_LUNA_HILIC_fixed_mods.hdf5 HILIC Spicer et al. 2018
full_hc_LUNA_SILICA_fixed_mods.hdf5 HILIC Spicer et al. 2018
full_hc_PXD000954_fixed_mods.hdf5 Reverse phase Rosenberger et al. 2014

By default, DeepLC selects the best model based on the calibration dataset. If no calibration is performed, the first default model is selected. Always keep note of the used models and the DeepLC version.

Citation

If you use DeepLC for your research, please use the following citation:

DeepLC can predict retention times for peptides that carry as-yet unseen modifications
Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens, Sven Degroeve
bioRxiv 2020.03.28.013003; doi: 10.1101/2020.03.28.013003

Q&A

Q: Is it required to indicate fixed modifications in the input file?

Yes, even modifications like carbamidomethyl should be in the input file.

Q: So DeepLC is able to predict the retention time for any modification?

Yes, DeepLC can predict the retention time of any modification. However, if the modification is very different from the peptides the model has seen during training the accuracy might not be satisfactory for you. For example, if the model has never seen a phosphor atom before, the accuracy of the prediction is going to be low.

Q: Installation fails. Why?

Please make sure to install DeepLC in a path that does not contain spaces. Run the latest LTS version of Ubuntu or Windows 10. Make sure you have enough disk space available, surprisingly TensorFlow needs quite a bit of disk space. If you are still not able to install DeepLC, please feel free to contact us:

Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be

Q: I have a special usecase that is not supported. Can you help?

Ofcourse, please feel free to contact us:

Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be

Q: DeepLC runs out of memory. What can I do?

You can try to reduce the batch size. DeepLC should be able to run if the batch size is low enough, even on machines with only 4 GB of RAM.

Q: I have a graphics card, but DeepLC is not using the GPU. Why?

For now DeepLC defaults to the CPU instead of the GPU. Clearly, because you want to use the GPU, you are a power user :-). If you want to make the most of that expensive GPU, you need to change or remove the following line (at the top) in deeplc.py:

# Set to force CPU calculations
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

Also change the same line in the function reset_keras():

# Set to force CPU calculations
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

Either remove the line or change to (where the number indicates the number of GPUs):

# Set to force CPU calculations
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

Q: What modification name should I use?

The names from unimod are used. The PSI-MS name is used by default, but the Interim name is used as a fall-back if the PSI-MS name is not available. Please also see unimod_to_formula.csv in the folder unimod/ for the naming of specific modifications.

Q: I have a modification that is not in unimod. How can I add the modification?

In the folder unimod/ there is the file unimod_to_formula.csv that can be used to add modifications. In the CSV file add a name (that is unique and not present yet) and the change in atomic composition. For example:

Met->Hse,O,H(-2) C(-1) S(-1)

Make sure to use negative signs for the atoms subtracted.

Q: Help, all my predictions are between [0,10]. Why?

It is likely you did not use calibration. No problem, but the retention times for training purposes were normalized between [0,10]. This means that you probably need to adjust the retention time yourselve after analysis or use a calibration set as the input.

Q: How does the ensemble part of DeepLC work?

Models within the same directory are grouped if they overlap in their name. The overlap has to be in their full name, except for the last part of the name after a "_"-character.

The following models will be grouped:

full_hc_dia_fixed_mods_a.hdf5
full_hc_dia_fixed_mods_b.hdf5

None of the following models will not be grouped:

full_hc_dia_fixed_mods2_a.hdf5
full_hc_dia_fixed_mods_b.hdf5
full_hc_dia_fixed_mods_2_b.hdf5

Q: I would like to take the ensemble average of multiple models, even if they are trained on different datasets. How can I do this?

Feel free to experiment! Models within the same directory are grouped if they overlap in their name. The overlap has to be in their full name, except for the last part of the name after a "_"-character.

The following models will be grouped:

model_dataset1.hdf5
model_dataset2.hdf5

So you just need to rename your models.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplc-0.1.17.tar.gz (44.6 MB view details)

Uploaded Source

Built Distribution

deeplc-0.1.17-py3-none-any.whl (44.6 MB view details)

Uploaded Python 3

File details

Details for the file deeplc-0.1.17.tar.gz.

File metadata

  • Download URL: deeplc-0.1.17.tar.gz
  • Upload date:
  • Size: 44.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for deeplc-0.1.17.tar.gz
Algorithm Hash digest
SHA256 a77bdbc4ed41fa7570cfbb14b9802d974b38f30c3eade04d1a1244591b489df5
MD5 f19bbe0a0dece80dddd264e1af3ba60c
BLAKE2b-256 9403d22f1d324a45983fe855aab25e8a5c389afa517e32ab196de58b99bfd5fb

See more details on using hashes here.

Provenance

File details

Details for the file deeplc-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: deeplc-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 44.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for deeplc-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 90d59486444ab29047c32de5e71236feba509e7fcd7a7f73f5eb0083eedecc6a
MD5 b243d23b0fde4386152937f81fe299af
BLAKE2b-256 83b89d5a09d4557793c4bbc8def740d5ee4b5a6bbc4b3685e688d79036ffd70c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page