Boll, Alexander (2025). Bridging the Data Desert: Mitigating Challenges of Model Accessibility in Simulink Research. (Thesis). Universität Bern, Bern
|
Text
25boll_a.pdf - Thesis Available under License Creative Commons: Attribution (CC-BY 4.0). Download (10MB) | Preview |
Abstract
Simulink is the industry standard for model-driven development in safety-critical domains such as automotive, aerospace, and medical devices. However, empirical research in the context of Simulink faces a persistent challenge: a scarcity of high-quality, industry-representative models that are essential for rigorous tool evaluation, empirical validation, and reproducible studies. This scarcity not only slows down scientific progress but also contributes to a reproduction crisis in the field – primarily due to the unavailability of experimental models. This thesis addresses this challenge through three interconnected contributions, grounded in a multi-method approach that includes a systematic literature review, empirical case studies, community surveys, dataset analysis, and tool prototyping and validation: 1. A diagnosis of model scarcity demonstrating that the lack of models limits the ability to conduct empirical research and also contributes to only 9% of Simulink tool studies meeting reproducibility criteria (i.e., all artifacts available). 2. An assessment of existing open-source Simulink models and datasets, evaluating their suitability for empirical research and investigating their limitations in scale, complexity, and industrial realism. Through case studies – including model matching, analyzing bus architecture of Simulink models, and investigating commenting practices – we demonstrate that open-source models, while imperfect, can serve as valuable research subjects for empirical investigation when carefully selected and used appropriately. 3. To address the lack of (i) large-scale and (ii) industry-representative models, we developed two novel tools: (i) Grandslam, a linearly scaling synthesizer for Simulink that generates models with adjustable properties, enabling the synthesis of very large open-source models; (ii) Smoke, a model anonymizer that removes sensitive information from Simulink models while preserving their structural properties, thus facilitating the sharing of real-world models without violating intellectual property constraints. Our work complements and extends contemporary datasets by showing their suitability for empirical research and providing tools for their expansion. By lowering the barriers to data access, we advance open science in model-driven engineering, enabling reproducible studies, specifically large-scale studies that were previously infeasible. The contributions of this thesis are foundational: they narrow the “data desert” in Simulink research and foster collaboration through shareable resources. Beyond immediate applications, our tools and findings support standardized benchmarks, comparative tool evaluations, and longitudinal studies of modeling practices – ultimately strengthening the empirical rigor and industrial relevance of Simulink research. In summary, this thesis provides both the evidence of a critical gap in Simulink research and practical solutions to address it, offering a pathway toward more transparent, reproducible, and impactful model-driven engineering.
| Item Type: | Thesis |
|---|---|
| Dissertation Type: | Cumulative |
| Date of Defense: | 16 December 2025 |
| Subjects: | 000 Computer science, knowledge & systems |
| Institute / Center: | 08 Faculty of Science > Institute of Computer Science (INF) |
| Depositing User: | Sarah Stalder |
| Date Deposited: | 22 Jan 2026 13:22 |
| Last Modified: | 22 Jan 2026 13:22 |
| URI: | https://boristheses.unibe.ch/id/eprint/7066 |
Actions (login required)
![]() |
View Item |
