Orsi, Markus (2025). Computational Strategies for the Data-Driven Discovery of Antimicrobial Peptides. (Thesis). Universität Bern, Bern
|
Text
25orsi_m.pdf - Thesis Available under License Creative Commons: Attribution (CC-BY 4.0). Download (12MB) | Preview |
Abstract
Cheminformatics has played a central role in medicinal chemistry, enabling the storage, analysis, and modelling of large volumes of chemical data, particularly for small organic molecules. However, its application to large and structurally complex compounds remains underdeveloped. This thesis addresses that gap by developing and improving computational tools that extend molecular representation and modelling strategies to natural products, modified peptides and macromolecules, which often fall outside the scope of conventional methods. One part of the thesis focuses on the reimplementation and extension of two molecular fingerprints. The macromolecule extended atom-pair fingerprint (MXFP) was adapted within an open-source framework and applied to the analysis of chemical spaces composed of molecular pairs. Separately, the MinHashed atom-pair fingerprint (MAP4) was extended to encode stereochemistry, resulting in MAP4C. Both MXFP and MAP4C were integrated into a revised version of the peptide design genetic algorithm (PDGA), a modular, rule-based framework for generating synthetically accessible peptide analogs. Coupling MAP4C to PDGA enabled efficient similarity-based exploration of combinatorial peptide spaces exceeding 10^60 structures. In addition, MXFP could be used to generate pharmacophorically similar peptide analogs of any query structure. The thesis also explores the use of deep learning models for prediction tasks related to peptides and natural products. A general-purpose language model (GPT-3.5 turbo) was benchmarked against established models for classifying antimicrobial and hemolytic peptide sequences. In a separate project, a transformer-based model was trained to predict the absolute configuration of natural products from achiral molecular input, potentially serving as a computational alternative to experimental stereochemistry assignment.
| Item Type: | Thesis |
|---|---|
| Dissertation Type: | Cumulative |
| Date of Defense: | 15 July 2025 |
| Subjects: | 000 Computer science, knowledge & systems 500 Science > 540 Chemistry 500 Science > 570 Life sciences; biology |
| Institute / Center: | 08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP) |
| Depositing User: | Sarah Stalder |
| Date Deposited: | 12 Nov 2025 12:14 |
| Last Modified: | 12 Nov 2025 12:14 |
| URI: | https://boristheses.unibe.ch/id/eprint/6849 |
Actions (login required)
![]() |
View Item |
