BORIS Theses

BORIS Theses
Bern Open Repository and Information System

Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products

Capecchi, Alice (2021). Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products. (Thesis). Universität Bern, Bern

21capecchi_a_.pdf - Thesis
Available under License Creative Commons: Attribution-Noncommercial-No Derivative Works (CC-BY-NC-ND 4.0).

Download (20MB) | Preview


Cheminformatics facilitates the analysis, storage, and collection of large quantities of chemical data, such as molecular structures and molecules' properties and biological activity, and it has revolutionized medicinal chemistry for small molecules. However, its application to larger molecules is still underrepresented. This thesis work attempts to fill this gap and extend the cheminformatics approach towards large molecules and peptides. This thesis is divided into two parts. The first part presents the implementation and application of two new molecular descriptors: macromolecule extended atom pair fingerprint (MXFP) and MinHashed atom pair fingerprint of radius 2 (MAP4). MXFP is an atom pair fingerprint suitable for large molecules, and here, it is used to explore the chemical space of non-Lipinski molecules within the widely used PubChem and ChEMBL databases. MAP4 is a MinHashed hybrid of substructure and atom pair fingerprints suitable for encoding small and large molecules. MAP4 is first benchmarked against commonly used atom pairs and substructure fingerprints, and then it is used to investigate the chemical space of microbial and plants natural products with the aid of machine learning and chemical space mapping. The second part of the thesis focuses on peptides, and it is introduced by a review chapter on approaches to discover novel peptide structures and describing the known peptide chemical space. Then, a genetic algorithm that uses MXFP in its fitness function is described and challenged to generate peptide analogs of peptidic or non-peptidic queries. Finally, supervised and unsupervised machine learning is used to generate novel antimicrobial and non-hemolytic peptide sequences.

Item Type: Thesis
Dissertation Type: Cumulative
Date of Defense: 8 October 2021
Subjects: 500 Science > 540 Chemistry
500 Science > 570 Life sciences; biology
Institute / Center: 08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)
Depositing User: Hammer Igor
Date Deposited: 11 Nov 2021 09:21
Last Modified: 11 Nov 2021 09:25

Actions (login required)

View Item View Item