A Comparative Study of Large and Small Language Models for Domain Model Extraction

Files

Access status: Embargo until 2026-09-25 , 978-3-032-21423-2_23.pdf (410.17 KB)

Publication date

2026-03

Authors

Chou, Cheng Yi
Aydemir, Fatma BasakORCID 0000-0003-3833-3997ISNI 0000000493355918
Dalpiaz, FabianoISNI 0000000419575525

Editors

Guizzardi, R.
Araújo, J.

Advisors

Supervisors

Document Type

Part of book

License

taverne

Abstract

[Context and Motivation] Large language models can derive conceptual models from textual requirements, offering an off-the-shelf alternative to traditional rule-based and machine-learning-based methods. [Question/Problem] Comparative evidence on the validity and completeness of different large and smaller language models for the domain model derivation task remains limited. [Principal ideas/Results] We compare GPT-o1, Llama3-8B, and Qwen-14B with the rule-based Visual Narrator using nine datasets containing user stories and corresponding domain models. Each language model was prompted with structured templates and evaluated on class and association extraction through precision, recall, and F-scores. GPT-o1 outperformed the smaller language models and matched or exceeded Visual Narrator in most tasks. Small language models produced competitive but less consistent results, revealing efficiency–accuracy trade-offs. [Contribution] We provide a systematic comparison of large language models, small language models, and rule-based modeling approaches and offer an updated evaluation framework to guide future research on the balance between scale, performance, and interpretability of the automated techniques for domain model extraction.

Keywords

Narratolarge language models, Visual Narrator, domain modeling, ser stories, small language models, Taverne

Citation

Chou, C Y, Aydemir, F B & Dalpiaz, F 2026, A Comparative Study of Large and Small Language Models for Domain Model Extraction. in R Guizzardi & J Araújo (eds), International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, pp. 336-351. https://doi.org/10.1007/978-3-032-21423-2_23