Towards Managed Clone-and-Own: Automating Matching and Patching

Schultheiss, Alexander Hartmut (2025). Towards Managed Clone-and-Own: Automating Matching and Patching. (Thesis). Universität Bern, Bern

Preview

Text
25schultheiss_ah.pdf - Thesis
Available under License Creative Commons: Attribution (CC-BY 4.0).
Download (6MB) | Preview

Abstract

In software development, developers often clone (i.e., duplicate) and adapt existing software artifacts, such as code, models, configuration files, or build scripts. The practice of cloning and adapting an entire software is generally known as clone-and-own development and clones created by clone-and-own are called variants. While clone-and-own allows for a flexible, fast, and independent reuse of software, it also leads to a significant increase of the overall development and maintenance cost with each new variant. The cost increases because relevant changes might have to be repeated on several variants but developers might not know which variants require these changes and where the changes should be applied. A recent line of research envisions the systematic support of clone-and-own development by automating the synchronization of variants in order to reduce the development and maintenance costs. The synchronization of multiple variants requires (a) the identification and localization of variant commonalities, and (b) the consistent propagation of changes between these commonalities. Techniques that address these major tasks typically focus on either visual artifacts, specifically models described in a visual modeling language (e.g., the UML), or textual artifacts, such as code and documentation. This separation is necessary because visual and textual artifacts require fundamentally different approaches, as they have substantially different representations. Likewise, clone-and-own research also focuses on either (visual) model artifacts or textual artifacts, with varying research progress depending on the artifact type. However, there are several drawbacks and uncertainties with existing techniques supporting clone-and-own development. With respect to (a), existing model matching algorithms, which identify the commonalities in the models of several variants, either process the variants one-by-one comparing two variants at a time, or process all variants simultaneously. A drawback of algorithms processing variants one-by-one is that their accuracy depends on the order in which the variants are processed, and may be severely affected if the order is unfavorable. A drawback of algorithms processing all variants simultaneously is that they do not scale to the sizes of models found in practice. As opposed to the challenges of model matching, popular text differencing algorithms (used in GNU diff and git diff) provide a solid starting point for the identification of commonalities between textual artifacts. However, with respect to (b), it is uncertain which specific mechanism should be used to address the subsequent step of propagating changes between common textual artifacts. Existing propagation mechanisms, aka. patchers, are either language-agnostic and applicable to arbitrary text, or language-specific and only applicable to text written in a specific (programming) language. The most-popular language-agnostic patchers, GNU patch and Git cherry-pick, might yield inaccurate results for variants that may diverge over time and expose differences that could impede propagation. Language-specific patchers account for differences, but they are severely limited in their applicability due to their specialization on a specific language. In this thesis, we follow the vision of research on systematic clone-and-own support and develop new techniques that address the drawbacks outlined above. First, we establish several new benchmarks and datasets to evaluate our research. To evaluate model matching, we collect a benchmark of various model datasets that covers a wide range of domains and modelling languages. To evaluate the patching of textual artifacts, we develop VEVOS, a novel benchmark generation framework that simulates the evolution of cloned variants, and mine a large dataset of change propagation scenarios from open source repositories. Then, we address the shortcomings of model matching, by developing RaQuN, a generic model matcher that applies an efficient nearest neighbor search for similar model elements to considerably reduce the number of computations needed for simultaneously comparing all variants. Next, we conduct an empirical study on the effectiveness of existing language-agnostic patchers, in which we gauge their automation potential when propagating changes between the textual artifacts of variants. Based on the insights of our study, we develop mpatch, a novel language-agnostic patcher for textual artifacts, that leverages a matching between variants to apply changes more accurately compared to existing patchers. We assess and evaluate our three primary research artifacts VEVOS, RaQuN , and mpatch. We find that VEVOS covers several evaluation scenarios of clone-and-own research that have not been covered by any other benchmark before. Our comparison of different model matchers shows that RaQuN scales even to the largest models that we found, while establishing matches of similar or even greater accuracy than all other matchers. Our empirical study on variant synchronization shows that existing language-agnostic patchers already have good automation potential, but only if they are used consistently from the start of a project. If existing patchers are applied to diverged variants, their automation rate is generally low and they require manual intervention in most cases. In contrast, mpatch, our novel patcher, achieves a considerably higher automation rate than the best existing language-agnostic patcher, and can automatically propagate the majority of changes between diverged variants. Using tch, practitioners can immediately enhance the change propagation between the variants in their projects.

Item Type:	Thesis
Dissertation Type:	Cumulative
Date of Defense:	6 June 2025
Subjects:	000 Computer science, knowledge & systems
Institute / Center:	08 Faculty of Science
Depositing User:	Sarah Stalder
Date Deposited:	04 Jul 2025 14:42
Last Modified:	04 Jul 2025 14:42
URI:	https://boristheses.unibe.ch/id/eprint/6359

Actions (login required)

View Item