Generating Realistic Natural Language Counterfactuals
Publication date
2021
Editors
Moens, Marie-Francine
Huang, Xuanjing
Specia, Lucia
Yih , Scott Wen-tau
Advisors
Supervisors
Document Type
Part of book
Metadata
Show full item recordCollections
License
cc_by
Abstract
Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.
Keywords
explainability, interpretability, explainable artificial intelligence, counterfactuals, natural langue processing
Citation
Robeer, M, Bex, F & Feelders, A 2021, Generating Realistic Natural Language Counterfactuals. in M-F Moens, X Huang, L Specia & S W Yih (eds), Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 3611–3625, 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7/11/21. https://doi.org/10.18653/v1/2021.findings-emnlp.306, conference