BORIS Theses

BORIS Theses
Bern Open Repository and Information System

Decoding Legalese Without Borders: Multilingual Evaluation of Language Models on Long Legal Texts

Niklaus, Joël (2024). Decoding Legalese Without Borders: Multilingual Evaluation of Language Models on Long Legal Texts. (Thesis). Universität Bern, Bern

24niklaus_j.pdf - Thesis
Available under License Creative Commons: Attribution (CC-BY 4.0).

Download (26MB) | Preview


Pretrained transformers have sparked an explosion of research in the field of Natural Language Processing (NLP). Scaling up language models based on the transformer architecture in terms of size, compute, and data led to impressive emergent capabilities that were considered unattainable in such a brief span, a mere three years ago, prior to the launch of GPT-3. These advances catapulted the previously niche field of legal NLP into the mainstream, at the latest, with GPT-4 passing the bar. Many products based on GPT-4 and other large language models are entering the market at an increasing pace, many of those targeting the legal field. This dissertation makes contributions in two key areas within Natural Language Processing (NLP) focused on legal text: resource curation and detailed model analysis. First, we curate an extensive set of multilingual legal datasets, train a variety of language models on these, and establish comprehensive benchmarks for evaluating Large Language Models (LLMs) in the legal domain. Second, we conduct a multidimensional analysis of model performance, focusing on metrics like explainability and calibration in the context of Legal Judgment Prediction. We introduce novel evaluation frameworks and find that while our trained models exhibit high performance and better calibration than human experts, they do not necessarily offer improved explainability. Furthermore, we investigate the feasibility of re-identification in anonymized legal texts, concluding that large-scale re-identification using LLMs is currently unfeasible. For future work, we propose exploring domain adaptation and instruction tuning to enhance language model performance on legal benchmarks, while also advocating for a detailed examination of dataset overlaps and model interpretability. Additionally, we emphasize the need for dataset extension to unexplored legal tasks and underrepresented jurisdictions, aiming for a more comprehensive coverage of the global legal landscape in NLP resources.

Item Type: Thesis
Dissertation Type: Cumulative
Date of Defense: 24 January 2024
Subjects: 000 Computer science, knowledge & systems
300 Social sciences, sociology & anthropology > 340 Law
400 Language > 410 Linguistics
Institute / Center: 08 Faculty of Science
Depositing User: Hammer Igor
Date Deposited: 08 Mar 2024 13:42
Last Modified: 30 Mar 2024 20:12

Actions (login required)

View Item View Item