EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

Mohammadi, Hadi; Giachanou, Anastasia; Bagheri, Ayoub

doi:https://doi.org/10.48550/arXiv.2510.05942

EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

Files

2510.05942v2.pdf (1.46 MB)

Publication date

2025-10-07

Authors

Mohammadi, Hadi

Giachanou, Anastasia

Bagheri, Ayoub

DOI

https://doi.org/10.48550/arXiv.2510.05942

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/workingpaper/preprint

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

We present EvalMORAAL, a transparent chain-of-thought (CoT) framework that uses two scoring methods (log-probabilities and direct ratings) plus a model-as-judge peer review to evaluate moral alignment in 20 large language models. We assess models on the World Values Survey (55 countries, 19 topics) and the PEW Global Attitudes Survey (39 countries, 8 topics). With EvalMORAAL, top models align closely with survey responses (Pearson's r approximately 0.90 on WVS). Yet we find a clear regional difference: Western regions average r=0.82 while non-Western regions average r=0.61 (a 0.21 absolute gap), indicating consistent regional bias. Our framework adds three parts: (1) two scoring methods for all models to enable fair comparison, (2) a structured chain-of-thought protocol with self-consistency checks, and (3) a model-as-judge peer review that flags 348 conflicts using a data-driven threshold. Peer agreement relates to survey alignment (WVS r=0.74, PEW r=0.39, both p<.001), supporting automated quality checks. These results show real progress toward culture-aware AI while highlighting open challenges for use across regions.

Citation

Mohammadi, H, Giachanou, A & Bagheri, A 2025 'EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models' arXiv. https://doi.org/10.48550/arXiv.2510.05942

URI

https://dspace.library.uu.nl/handle/1874/463500

EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI