A Comparative Study of Large and Small Language Models for Domain Model Extraction
Files
Publication date
2026-03
Editors
Guizzardi, R.
Araújo, J.
Advisors
Supervisors
Document Type
Part of book
Metadata
Show full item recordCollections
License
taverne
Abstract
[Context and Motivation] Large language models can derive conceptual models from textual requirements, offering an off-the-shelf alternative to traditional rule-based and machine-learning-based methods. [Question/Problem] Comparative evidence on the validity and completeness of different large and smaller language models for the domain model derivation task remains limited. [Principal ideas/Results] We compare GPT-o1, Llama3-8B, and Qwen-14B with the rule-based Visual Narrator using nine datasets containing user stories and corresponding domain models. Each language model was prompted with structured templates and evaluated on class and association extraction through precision, recall, and F-scores. GPT-o1 outperformed the smaller language models and matched or exceeded Visual Narrator in most tasks. Small language models produced competitive but less consistent results, revealing efficiency–accuracy trade-offs. [Contribution] We provide a systematic comparison of large language models, small language models, and rule-based modeling approaches and offer an updated evaluation framework to guide future research on the balance between scale, performance, and interpretability of the automated techniques for domain model extraction.
Keywords
Narratolarge language models, Visual Narrator, domain modeling, ser stories, small language models, Taverne
Citation
Chou, C Y, Aydemir, F B & Dalpiaz, F 2026, A Comparative Study of Large and Small Language Models for Domain Model Extraction. in R Guizzardi & J Araújo (eds), International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, pp. 336-351. https://doi.org/10.1007/978-3-032-21423-2_23