Skip to main content

Fork of MHNreact for use in the syntheseus library

Project description

MHNreact

arXiv arXiv License Hf Demo Open In Colab

Abstract | Environment | Data | Training | Loading | Citation

Adapting modern Hopfield networks (Ramsauer et al., 2021) (MHN) to associate different data modalities, molecules and reaction templates, to improve predictive performance for rare templates and single-step retrosynthesis.

overview_image

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

paper

Philipp Seidl, Philipp Renz, Natalia Dyubankova, Paulo Neves, Jonas Verhoeven, Marwin Segler, Jörg K. Wegner, Sepp Hochreiter, Günter Klambauer

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k.

Minimal working example

Opent the following colab for a quick training example train-colab.

Environment

Anaconda

When using conda, an environment can be set up using

conda env create -f env.yml

To activate the environment call conda activate mhnreact_env.

Additionally one needs to install template-relevance which is included in this package, as well as rdchiral using pip:

cd data/temprel-fortunato/template-relevance-master/
pip install -e .
pip install -e "git://github.com/connorcoley/rdchiral.git#egg=rdchiral"

You may need to adjust the CUDA version. The code was tested with:

  • rdkit 2021.03.1 and 2020.03.4
  • python 3.7 and 3.8
  • pytorch 1.6
  • rdchiral
  • template-relevance (./data/temprel-forunato)
  • CGRtools (only for preparing USPTO-golden)

A second option is to run

conda create -n mhnreact_env python=3.8
eval "$(conda shell.bash hook)"
conda activate mhnreact_env
conda install -c conda-forge rdkit
pip install torch scipy ipykernel matplotlib sklearn swifter
cd data/temprel-fortunato/template-relevance-master/
pip install -e .
pip install -e "git://github.com/connorcoley/rdchiral.git#egg=rdchiral"

which is equivialent to running the script bash ./scripts/make_env.sh

Docker

Another option is the provided docker-file within ./tools/docker/ with the following command.

DOCKER_BUILDKIT=1 docker build -t mhnreact:latest -f Dockerfile ../..

Data and processing

The preprocessed data is contained in this repository: ./data/processed/uspto_sm_* files contain the preprocessed files for template relevance prediction.

For single-step retrosynthesis the preprocessed and split data can be found in ./data/USPTO_50k_MHN_prepro.csv

All preprocessing steps can be replicated and found in ./examples/prepro_*.ipynb for USPTO-sm, USPTO-lg as well as for USPTO-50k and USPTO-full. USPTO-lg as well as USPTO-full are not contained due to their size, and would have to be created using the coresponding notebook.

Training

Models can be trained using python mhnreact/train.py -m

Selected calls are documented within ./notebooks/*_training_*.ipynb.

Arguments are documented within the module which can be retreived by adding --help to the call. Within the notebooks folder there are notebooks containing several examples.

Some main parameters are:

  • model_type: Model-type, choose from 'segler', 'fortunato', 'mhn' or 'staticQK', default:'mhn'
  • dataset_type: Dataset 'sm', 'lg' for template relevance prediction; (use --csv_path for single-step retrosynthesis input)
  • fp_type: Fingerprint type for the input only!: default: 'morgan', other options: 'rdk', 'ECFP', 'ECFC', 'MxFP', 'Morgan2CBF'
  • template_fp_tpye: Template-fingerprint type: default: 'rdk', other options: 'MxFP', 'rdkc', 'tfidf', 'random'
  • hopf_beta: hopfield beta parameter, default=0.005
  • hopf_asso_dim: association dimension, default=512
  • ssretroeval: single-step retrosynthesis evaluation, default=False

an example call for single-step retrosynthesis is:

python -m mhnreact.train --model_type=mhn --fp_size=4096 --fp_type morgan --template_fp_type rdk --concat_rand_template_thresh 1 \
--exp_name test --dataset_type 50k --csv_path ./data/USPTO_50k_MHN_prepro.csv.gz --ssretroeval True --seed 0

Loading in trained Models and Evaluation

How to load in trained models can be seen in ./examples/20_evaluation.ipynb. The model is then used to predict on a test set.

Train on custom data

Preprocess the data in a format as can be found in ./data/USPTO_50k_MHN_prepro.csv and use the argument --csv_path.

Citation

To cite this work, you can use the following bibtex entry:

@article{seidl2021modern,
   author = {Seidl, Philipp and Renz, Philipp and Dyubankova, Natalia and Neves, Paulo and Verhoeven, Jonas and Segler, Marwin and Wegner, J{\"o}rg K. and Hochreiter, Sepp and Klambauer, G{\"u}nter},
   title = {Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks},
   journal = {Journal of Chemical Information and Modeling},
   volume = {62},
   number = {9},
   pages = {2111-2120},
   institution = {Institute for Machine Learning, Johannes Kepler University, Linz},
   year = {2022},
   doi = {10.1021/acs.jcim.1c01065},
   url = {https://doi.org/10.1021/acs.jcim.1c01065},
}

References

  • Ramsauer et al.(2020). ICLR2021 (pdf)

Keywords

Drug Discovery, CASP, Machine Learning, Synthesis, Zero-shot, Modern Hopfield Networks

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntheseus-mhnreact-1.0.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

syntheseus_mhnreact-1.0-py3-none-any.whl (49.9 kB view details)

Uploaded Python 3

File details

Details for the file syntheseus-mhnreact-1.0.tar.gz.

File metadata

  • Download URL: syntheseus-mhnreact-1.0.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for syntheseus-mhnreact-1.0.tar.gz
Algorithm Hash digest
SHA256 d36ebcf3433684d828eb9b1c89d433aa60dd26143c6a48d29ea1d83f08e5b8c8
MD5 914e0ef5af3186791fdf6bc609e01fb8
BLAKE2b-256 067c09a6a8acb2b746b76ab956e131d73d3d19abc70ea8cc53785e298b9822b3

See more details on using hashes here.

File details

Details for the file syntheseus_mhnreact-1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for syntheseus_mhnreact-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b88a6fe23ea30c2c087f2660c0a98486efa9a8b11a82f515f9f7d82c9f1853bc
MD5 80ffc68f8d33557dc94b104e736c5608
BLAKE2b-256 d0dc52e5a05d4911b83219d1f5f1423e394e3a029c2bdc51a9b4fb7cbacb3830

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page