Fork of MHNreact for use in the syntheseus library

Project description

MHNreact

Adapting modern Hopfield networks (Ramsauer et al., 2021) (MHN) to associate different data modalities, molecules and reaction templates, to improve predictive performance for rare templates and single-step retrosynthesis.

overview_image

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

paper

Philipp Seidl, Philipp Renz, Natalia Dyubankova, Paulo Neves, Jonas Verhoeven, Marwin Segler, Jörg K. Wegner, Sepp Hochreiter, Günter Klambauer

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k.

Minimal working example

Opent the following colab for a quick training example train-colab.

Environment

Anaconda

When using conda, an environment can be set up using

conda env create -f env.yml

To activate the environment call conda activate mhnreact_env.

Additionally one needs to install template-relevance which is included in this package, as well as rdchiral using pip:

cd data/temprel-fortunato/template-relevance-master/
pip install -e .
pip install -e "git://github.com/connorcoley/rdchiral.git#egg=rdchiral"

You may need to adjust the CUDA version. The code was tested with:

rdkit 2021.03.1 and 2020.03.4
python 3.7 and 3.8
pytorch 1.6
rdchiral
template-relevance (./data/temprel-forunato)
CGRtools (only for preparing USPTO-golden)

A second option is to run

conda create -n mhnreact_env python=3.8
eval "$(conda shell.bash hook)"
conda activate mhnreact_env
conda install -c conda-forge rdkit
pip install torch scipy ipykernel matplotlib sklearn swifter
cd data/temprel-fortunato/template-relevance-master/
pip install -e .
pip install -e "git://github.com/connorcoley/rdchiral.git#egg=rdchiral"

which is equivialent to running the script bash ./scripts/make_env.sh

Docker

Another option is the provided docker-file within ./tools/docker/ with the following command.

DOCKER_BUILDKIT=1 docker build -t mhnreact:latest -f Dockerfile ../..

Data and processing

The preprocessed data is contained in this repository: ./data/processed/uspto_sm_* files contain the preprocessed files for template relevance prediction.

For single-step retrosynthesis the preprocessed and split data can be found in ./data/USPTO_50k_MHN_prepro.csv

All preprocessing steps can be replicated and found in ./examples/prepro_*.ipynb for USPTO-sm, USPTO-lg as well as for USPTO-50k and USPTO-full. USPTO-lg as well as USPTO-full are not contained due to their size, and would have to be created using the coresponding notebook.

Training

Models can be trained using python mhnreact/train.py -m

Selected calls are documented within ./notebooks/*_training_*.ipynb.

Arguments are documented within the module which can be retreived by adding --help to the call. Within the notebooks folder there are notebooks containing several examples.

Some main parameters are:

model_type: Model-type, choose from 'segler', 'fortunato', 'mhn' or 'staticQK', default:'mhn'
dataset_type: Dataset 'sm', 'lg' for template relevance prediction; (use --csv_path for single-step retrosynthesis input)
fp_type: Fingerprint type for the input only!: default: 'morgan', other options: 'rdk', 'ECFP', 'ECFC', 'MxFP', 'Morgan2CBF'
template_fp_tpye: Template-fingerprint type: default: 'rdk', other options: 'MxFP', 'rdkc', 'tfidf', 'random'
hopf_beta: hopfield beta parameter, default=0.005
hopf_asso_dim: association dimension, default=512
ssretroeval: single-step retrosynthesis evaluation, default=False

an example call for single-step retrosynthesis is:

python -m mhnreact.train --model_type=mhn --fp_size=4096 --fp_type morgan --template_fp_type rdk --concat_rand_template_thresh 1 \
--exp_name test --dataset_type 50k --csv_path ./data/USPTO_50k_MHN_prepro.csv.gz --ssretroeval True --seed 0

Loading in trained Models and Evaluation

How to load in trained models can be seen in ./examples/20_evaluation.ipynb. The model is then used to predict on a test set.

Train on custom data

Preprocess the data in a format as can be found in ./data/USPTO_50k_MHN_prepro.csv and use the argument --csv_path.

Citation

To cite this work, you can use the following bibtex entry:

@article{seidl2021modern,
   author = {Seidl, Philipp and Renz, Philipp and Dyubankova, Natalia and Neves, Paulo and Verhoeven, Jonas and Segler, Marwin and Wegner, J{\"o}rg K. and Hochreiter, Sepp and Klambauer, G{\"u}nter},
   title = {Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks},
   journal = {Journal of Chemical Information and Modeling},
   volume = {62},
   number = {9},
   pages = {2111-2120},
   institution = {Institute for Machine Learning, Johannes Kepler University, Linz},
   year = {2022},
   doi = {10.1021/acs.jcim.1c01065},
   url = {https://doi.org/10.1021/acs.jcim.1c01065},
}

References

Ramsauer et al.(2020). ICLR2021 (pdf)

Keywords

Drug Discovery, CASP, Machine Learning, Synthesis, Zero-shot, Modern Hopfield Networks

Project details

Release history Release notifications | RSS feed

This version

1.0

Dec 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntheseus-mhnreact-1.0.tar.gz (48.1 kB view details)

Uploaded Dec 18, 2023 Source

Built Distribution

syntheseus_mhnreact-1.0-py3-none-any.whl (49.9 kB view details)

Uploaded Dec 18, 2023 Python 3

File details

Details for the file syntheseus-mhnreact-1.0.tar.gz.

File metadata

Download URL: syntheseus-mhnreact-1.0.tar.gz
Upload date: Dec 18, 2023
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for syntheseus-mhnreact-1.0.tar.gz
Algorithm	Hash digest
SHA256	`d36ebcf3433684d828eb9b1c89d433aa60dd26143c6a48d29ea1d83f08e5b8c8`
MD5	`914e0ef5af3186791fdf6bc609e01fb8`
BLAKE2b-256	`067c09a6a8acb2b746b76ab956e131d73d3d19abc70ea8cc53785e298b9822b3`

See more details on using hashes here.

File details

Details for the file syntheseus_mhnreact-1.0-py3-none-any.whl.

File metadata

Download URL: syntheseus_mhnreact-1.0-py3-none-any.whl
Upload date: Dec 18, 2023
Size: 49.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for syntheseus_mhnreact-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b88a6fe23ea30c2c087f2660c0a98486efa9a8b11a82f515f9f7d82c9f1853bc`
MD5	`80ffc68f8d33557dc94b104e736c5608`
BLAKE2b-256	`d0dc52e5a05d4911b83219d1f5f1423e394e3a029c2bdc51a9b4fb7cbacb3830`