Skip to main content

MS²Rescore: Sensitive PSM rescoring with predicted MS² peak intensities and retention times.

Project description



GitHub release PyPI GitHub Workflow Status GitHub issues GitHub Last commit Twitter

Sensitive peptide identification rescoring with predicted spectra using MS²PIP, DeepLC, and Percolator.



About MS²Rescore

MS²Rescore performs sensitive peptide identification rescoring with predicted spectra using MS²PIP, DeepLC, and Percolator. This results in more confident peptide identifications, which allows you to get more peptide IDs at the same false discovery rate (FDR) threshold, or to set a more stringent FDR threshold while still retaining a similar number of peptide IDs. MS²Rescore is ideal for challenging proteomics identification workflows, such as proteogenomics, metaproteomics, or immunopeptidomics.

MS²Rescore uses identifications from a Percolator IN (PIN) file, or from the output of one of these search engines:

  • MaxQuant: Start from msms.txt identification file and directory with .mgf files.
  • MSGFPlus: Start with an .mzid identification file and corresponding .mgf.
  • X!Tandem: Start with an X!Tandem .xml identification file and corresponding .mgf.
  • PEAKS: Start with an .mzid identification file and directory with .mgf files.
  • PeptideShaker: Start with a PeptideShaker Extended PSM Report and corresponding .mgf file.

If you use MS²Rescore, please cite the following article:

MS2Rescore: Data-driven rescoring dramatically boosts immunopeptide identification rates.
Arthur Declercq, Robbin Bouwmeester, Sven Degroeve, Lennart Martens, and Ralf Gabriels.
bioRxiv (2021) doi:10.1101/2021.11.02.466886


The concept of rescoring with predicted spectrum features was first described in:

Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions.
Ana S C Silva, Robbin Bouwmeester, Lennart Martens, and Sven Degroeve.
Bioinformatics (2019) doi:10.1093/bioinformatics/btz383

To replicate the experiments described in this article, check out the pub branch of this repository.


Installation

Python package

install pip

MS²Rescore requires:

  • Python 3.7 or 3.8 on Linux, macOS, or Windows
  • If the option run_percolator is set to True, Percolator needs to be installed and callable with the percolator command (tested with v3.02.1)
  • Some pipelines require the Percolator converters, such as tandem2pin, as well. These are usually installed alongside Percolator.

Minimal installation:

pip install ms2rescore

Installation including dependencies for the graphical user interface:

pip install ms2rescore[gui]

We highly recommend using a venv or conda virtual environment.

Windows installer

get for windows

  1. Download and install Percolator and the percolator-converters. Make sure to select "Add percolator to the system PATH for all users" during setup.
  2. Download the zip file from the latest release and unzip.
  3. Run install-gui-windows.bat to install Python and MS²Rescore.
  4. Run start-gui-windows.bat to start the MS²Rescore GUI.

If Microsoft Defender SmartScreen displays a warning, click "More info" and then click "Run anyway". When starting the GUI, don't mind the terminal window that opens next to the GUI.

To install a newer version of MS²Rescore, run upgrade-gui-windows.bat.


Usage

GUI

Run start-gui-windows.bat or run ms2rescore-gui or python -m ms2rescore.guiin your terminal to start the graphical user interface. Most common settings can be configured through the UI. For some advanced settings, see Configuration file.

Command line interface

Run MS²Rescore from the command line as follows:

ms2rescore -c <path-to-config-file> -m <path-to-mgf> <path-to-identification-file>

Run ms2rescore --help to see all command line options.

Configuration file

Although most options can be configered through the CLI and the GUI, MS²Rescore can be further configured through a JSON configuration file. A correct configuration is required to, for example, correctly parse the peptide modifications from the search engine output. If no configuration file is passed, or some options are not configured, the default values for these settings will be used. Options passed from the CLI and the GUI will override the configuration file. The full configuration is validated against a JSON Schema.

A full example configuration file can be found in ms2rescore/package_data/config_default.json.

The config file contains three top level categories (general, ms2pip and percolator) and an additional categories for specific search engines (e.g. maxquant). The most important options in general are:

  • pipeline (string): Pipeline to use, depending on input format. Must be one of: ['infer', 'pin', 'tandem', 'maxquant', 'msgfplus', 'peptideshaker']. Default: infer.
  • feature_sets (array): Feature sets for which to generate PIN files and optionally run Percolator. Default: ['searchengine', 'rt', 'ms2pip'].
    • Items (array)
      • Items (string): Must be one of: ['searchengine', 'rt', 'ms2pip'].

An overview of all options can be found in configuration.md

Notes for specific search engines

  • MSGFPlus: Run MSGFPlus in a concatenated target-decoy search, with the -addFeatures 1 flag.
  • MaxQuant:
    • Run MaxQuant without FDR filtering (set to 1)
    • MaxQuant requires additional options in the configuration file:
      • modification_mapping: Maps MaxQuant output to MS²PIP modifications list. Keys must contain MaxQuant's two-letter modification codes and values must match one of the modifications listed in the MS²PIP configuration (see MS2PIP config).
      • fixed_modifications: Must list all modifications set as fixed during the MaxQuant search (as this is not denoted in the msms.txt file). Keys refer to the amino acid, values to the modification name used in the MS²PIP configuration.
      • The maxquant specific configuration could for example be:
        "maxquant_to_rescore": {
          "modification_mapping":{
            "ox":"Oxidation",
            "cm":"Carbamidomethyl"
          },
          "fixed_modifications":{
            "C":"Carbamidomethyl"
          }
        

As a general rule, MS²Rescore always needs access to all target and decoy PSMs, not only the FDR-filtered targets.

Output

Several intermediate files are created when the entire pipeline is run. These can be accessed by specifying the tmp_dir or Temporary file directory option. Depending on whether or not Percolator is run, the following output files can be expected:

For each feature set combination (e.g. [rt, ms2pip, searchengine]):

  • <file>.pin Percolator IN file
  • <file>_target_psms.pout Percolator OUT file with target PSMs
  • <file>_decoy_psms.pout Percolator OUT file with decoy PSMs
  • <file>_target_peptides.pout Percolator OUT file with target peptides
  • <file>_decoy_peptides.pout Percolator OUT file with decoy peptides
  • <file>.weights Internal feature weights used by Percolator's scoring function.

Plotting

When running MS²Rescore you can automatically generate plots (pdf) by setting the plotting parameter in the config to `True. Generaly these plots show (unique) identifications for 1% and 0.1% FDR and the percolator weights for the first feature set combination. You can also use the CLI create these plots as follows:

ms2rescore-plotting <path-to-pin-file> 
-p <path-to-pout-file> 
-d <path-to-pout_dec-file> 
-f <feature-sets-used> 
-s <pin-file-score-column-name>

Run ms2rescore-plotting --help to see all command line options.

If you want to compare MS²Rescore runs with different features sets you can add multiple -p -d and -f flags as follows:

ms2rescore-plotting <path-to-pin-file> 
-p <path-to-first-pout-file> -p <path-to-second-pout-file> 
-d <path-to-first-pout_dec-file> -d <path-to-second-pout_dec-file> 
-f <first-feature-sets-used> -f <second-feature-sets-used>  
-s <pin-file-score-column-name>

The pin files are the same for the same MS²Rescore files are be the same in terms of identifications so only one pin file is needed.


Contributing

Bugs, questions or suggestions? Feel free to post an issue in the issue tracker or to make a pull request!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms2rescore-3.0.0.dev3.tar.gz (410.2 kB view details)

Uploaded Source

Built Distribution

ms2rescore-3.0.0.dev3-py3-none-any.whl (419.9 kB view details)

Uploaded Python 3

File details

Details for the file ms2rescore-3.0.0.dev3.tar.gz.

File metadata

  • Download URL: ms2rescore-3.0.0.dev3.tar.gz
  • Upload date:
  • Size: 410.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for ms2rescore-3.0.0.dev3.tar.gz
Algorithm Hash digest
SHA256 5659e500be1d6f6708c272e160a44eac992afcc39316a3ea6d22b6ff5f656140
MD5 e97c7a2605514f0bfb1ec8cb8d5a1233
BLAKE2b-256 c10113f1f1b24c883514f3c43a061d1ebf02415f637f19a6f7e1c9044ecd650e

See more details on using hashes here.

Provenance

File details

Details for the file ms2rescore-3.0.0.dev3-py3-none-any.whl.

File metadata

File hashes

Hashes for ms2rescore-3.0.0.dev3-py3-none-any.whl
Algorithm Hash digest
SHA256 5566a9be2307810222cc36768fa289b7a8477411a3e66f7a1e84121caf61287f
MD5 b6ec4ddd2a735a14a9c614ebba986e6c
BLAKE2b-256 f88b283da8982e1234783151b3a6577c9584c1591640f9b74fb063fd91c2303b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page