Generating Realistic Natural Language Counterfactuals

Publication date

2021

Authors

Robeer, MarcelISNI 0000000526331040
Bex, FlorisORCID 0000-0002-5699-9656ISNI 0000000118066508
Feelders, AdISNI 0000000350720316

Editors

Moens, Marie-Francine
Huang, Xuanjing
Specia, Lucia
Yih , Scott Wen-tau

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

cc_by

Abstract

Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.

Keywords

explainability, interpretability, explainable artificial intelligence, counterfactuals, natural langue processing

Citation

Robeer, M, Bex, F & Feelders, A 2021, Generating Realistic Natural Language Counterfactuals. in M-F Moens, X Huang, L Specia & S W Yih (eds), Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 3611–3625, 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7/11/21. https://doi.org/10.18653/v1/2021.findings-emnlp.306, conference