Generating Realistic Natural Language Counterfactuals

Robeer, Marcel; Bex, Floris; Feelders, Ad

doi:https://doi.org/10.18653/v1/2021.findings-emnlp.306

Generating Realistic Natural Language Counterfactuals

Files

2021.findings_emnlp.306.pdf (700.16 KB)

Publication date

2021

Authors

Robeer, Marcel

Bex, Floris

Feelders, Ad

Editors

Moens, Marie-Francine

Huang, Xuanjing

Specia, Lucia

Yih , Scott Wen-tau

DOI

https://doi.org/10.18653/v1/2021.findings-emnlp.306

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.

Keywords

explainability, interpretability, explainable artificial intelligence, counterfactuals, natural langue processing

Citation

Robeer, M, Bex, F & Feelders, A 2021, Generating Realistic Natural Language Counterfactuals. in M-F Moens, X Huang, L Specia & S W Yih (eds), Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 3611–3625, 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7/11/21. https://doi.org/10.18653/v1/2021.findings-emnlp.306, conference

URI

https://dspace.library.uu.nl/handle/1874/415077

Generating Realistic Natural Language Counterfactuals

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI