BORIS Theses

BORIS Theses
Bern Open Repository and Information System

Towards Sustainable Synthesis: Integrating Biocatalysis and Language Models in Computer-Aided Synthesis Planning

Kreutter, David (2023). Towards Sustainable Synthesis: Integrating Biocatalysis and Language Models in Computer-Aided Synthesis Planning. (Thesis). Universität Bern, Bern

[img]
Preview
Text
23kreutter_d.pdf - Thesis
Available under License Creative Commons: Attribution (CC-BY 4.0).

Download (40MB) | Preview

Abstract

This thesis is motivated by the objective of condensing experimental biocatalysis knowledge into machine-learning models. It also seeks to bridge the gap between Computer-Aided Synthesis Planning (CASP) tools and biocatalysis. The aim is to develop and implement a multistep synthesis planning software that can incorporate and explore both chemo- and biocatalysis reactions. The resulting tool should provide chemists with mixed catalytic synthetic route options, unlocking biocatalytic opportunities in chemical synthesis. First, I explore the capabilities of a natural language processing model to learn biocatalysis reactions and perform forward reaction predictions. Similar to how chemists would learn biocatalysis, I provide the starting materials combined with a textual description of the enzyme and train the model to predict the product of the enzymatic transformation. I investigate the influence of transfer learning methods and demonstrate the model’s performance with insightful examples. Additionally, I present practical use cases and investigate the limits of the Enzymatic Transformer. Secondly, I report the creation of the first multistepTransformer-based computer-aided synthesis planning software, leveraging disconnection-aware models for a broader exploration of the chemical space. I design tagging strategies that automatically generate disconnection prompts, balancing diversity and computing time. I employ a triple transformer loop, predicting starting materials (T1), reagents, catalysts, and solvents (T2), followed by a forward validation model (T3) to limit unrealistic predictions. The resulting single-step framework explores a significantly more diverse chemical space while maintaining a critical assessment of the chemical feasibility of the predicted reactions. I detail the implementation of a multistep search using a best-first tree search algorithm guided by a new route penalty score, prioritizing short and efficient routes while exploring diverse retrosynthetic options. I showcase the performance of the CASP tool with insightful retrosynthesis examples of drug molecules. The models, along with the code, are available on GitHub as a Python package. Finally, I integrate an independent triple transformer loop for biocatalysis into the previously designed CASP software. The reported implementation explores both chemo- and biocatalysis in parallel and builds mixed synthetic routes. This allows the suggestion of the most efficient synthetic routes to chemists, incorporating biocatalytic steps whenever possible, opening the door to more sustainable synthesis route design.

Item Type: Thesis
Dissertation Type: Cumulative
Date of Defense: 30 November 2023
Subjects: 500 Science > 540 Chemistry
500 Science > 570 Life sciences; biology
Institute / Center: 08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)
Depositing User: Hammer Igor
Date Deposited: 31 Oct 2024 14:43
Last Modified: 03 Nov 2024 02:12
URI: https://boristheses.unibe.ch/id/eprint/5543

Actions (login required)

View Item View Item