BORIS Theses

BORIS Theses
Bern Open Repository and Information System

Computational Strategies for the Data-Driven Discovery of Antimicrobial Peptides

Orsi, Markus (2025). Computational Strategies for the Data-Driven Discovery of Antimicrobial Peptides. (Thesis). Universität Bern, Bern

[img]
Preview
Text
25orsi_m.pdf - Thesis
Available under License Creative Commons: Attribution (CC-BY 4.0).

Download (12MB) | Preview

Abstract

Cheminformatics has played a central role in medicinal chemistry, enabling the storage, analysis, and modelling of large volumes of chemical data, particularly for small organic molecules. However, its application to large and structurally complex compounds remains underdeveloped. This thesis addresses that gap by developing and improving computational tools that extend molecular representation and modelling strategies to natural products, modified peptides and macromolecules, which often fall outside the scope of conventional methods. One part of the thesis focuses on the reimplementation and extension of two molecular fingerprints. The macromolecule extended atom-pair fingerprint (MXFP) was adapted within an open-source framework and applied to the analysis of chemical spaces composed of molecular pairs. Separately, the MinHashed atom-pair fingerprint (MAP4) was extended to encode stereochemistry, resulting in MAP4C. Both MXFP and MAP4C were integrated into a revised version of the peptide design genetic algorithm (PDGA), a modular, rule-based framework for generating synthetically accessible peptide analogs. Coupling MAP4C to PDGA enabled efficient similarity-based exploration of combinatorial peptide spaces exceeding 10^60 structures. In addition, MXFP could be used to generate pharmacophorically similar peptide analogs of any query structure. The thesis also explores the use of deep learning models for prediction tasks related to peptides and natural products. A general-purpose language model (GPT-3.5 turbo) was benchmarked against established models for classifying antimicrobial and hemolytic peptide sequences. In a separate project, a transformer-based model was trained to predict the absolute configuration of natural products from achiral molecular input, potentially serving as a computational alternative to experimental stereochemistry assignment.

Item Type: Thesis
Dissertation Type: Cumulative
Date of Defense: 15 July 2025
Subjects: 000 Computer science, knowledge & systems
500 Science > 540 Chemistry
500 Science > 570 Life sciences; biology
Institute / Center: 08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)
Depositing User: Sarah Stalder
Date Deposited: 12 Nov 2025 12:14
Last Modified: 12 Nov 2025 12:14
URI: https://boristheses.unibe.ch/id/eprint/6849

Actions (login required)

View Item View Item