Automatically Expressing the Meaning of Logical Formulae in Natural Language
Publication date
2026-06-17
Editors
Advisors
Document Type
Dissertation
Metadata
Show full item recordCollections
License
Abstract
For as long as natural language generation (NLG) has existed, "logic-to-text" generation, i.e., NLG from mathematical logic formulae, has attracted attention due to its many potential applications, including intelligent tutoring systems, ontology verbalization, and explainable AI. This thesis investigates the question of how to build, and especially how to evaluate, systems for logic-to-text generation. Its contributions center on logic-to-text NLG, but also extend to the broader field of data-to-text NLG. Chapters 3 and 4 study generation mechanisms and input manipulations. In Chapter 3, I present LoLa, a rule-based logic-to-text generation system that produces English text from first-order logic formulae. Its main innovation lies in simplifying input formulae through logical equivalences prior to translation. Extensive evaluations using both standard automatic metrics and human judgments show that LoLa is effective at generating natural language translations of logical formulae. In Chapter 4, I turn to the question of identifying the most amenable input for a logic-to-text system. Focusing on the role of brevity, I introduce an algorithm for computing the shortest equivalents of an input formula and report automatic and human evaluations to show that manipulating input formulae actually improves output quality. However, it is unclear whether the shortest equivalent to a given formula is always the best input. Chapters 5 through 8 focus on evaluation. In Chapter 5, I conduct a meta-evaluation in which user interfaces employed in past human evaluations are assessed by user experience experts. Building on their insights, I derive recommendations for designing more effective and engaging annotation interfaces. In Chapter 6, I present a survey of human evaluations of hallucinations, investigating how these evaluations are designed. The analysis reveals several methodological shortcomings, including the frequent omission of crucial details such as annotation guidelines, user interface design, inter-annotator agreement metrics, and annotator demographics and compensation. In Chapter 7, I present the first implementation of a logic-based framework for hallucination analysis in real-world data-to-text domains, including logic-to-text, experimenting with both human and LLM annotators. The results show that applying the framework to concrete data-to-text domains is feasible, but not straightforward. Human annotators achieve only low to modest accuracies, depending on the domain. By contrast, models demonstrate potential to perform the annotation task, thereby enabling scalability. In Chapter 8, I introduce the concept of formulaicness, a measure of how strongly an output text mirrors the structural form of its input, and I propose its use as an enhancement for automatic evaluation of naturalness, again with a focus on logic-to-text NLG. The results suggest that formulaicness is a valuable addition to the automatic evaluation of naturalness, aligning well with human judgments and consistently improving the correlation of baseline metrics with human ratings. Overall, the thesis contributes to logic-to-text NLG in two main ways: by highlighting its specific challenges, such as the identification of more amenable inputs, and by taking steps to improve existing approaches to generation and evaluation. More broadly, it also offers more general insights for improving the evaluation of data-to-text NLG systems.
Keywords
natuurlijke taalgeneratie, logic-to-text, data-to-text, NLG, logica, evaluatie, hallucinaties, natural language generation, logic-to-text, data-to-text, NLG, logic, evaluation, hallucinations
Citation
Calò, E 2026, 'Automatically Expressing the Meaning of Logical Formulae in Natural Language', Doctor of Philosophy, Universiteit Utrecht, Utrecht. https://doi.org/10.33540/3515