How and where does CLIP process negation?

Quantmeyer, Vincent; Mosteiro Romero, Pablo; Gatt, Albert

How and where does CLIP process negation?

Files

2024.alvr-1.5.pdf (610.92 KB)

Publication date

2024-08

Authors

Quantmeyer, Vincent

Mosteiro Romero, Pablo

Gatt, Albert

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Various benchmarks have been proposed to test linguistic understanding in pre-trained vision & language (VL) models. Here we build on the existence task from the VALSE benchmark (Parcalabescu et al., 2022) which we use to test models’ understanding of negation, a par- ticularly interesting issue for multimodal mod- els. However, while such VL benchmarks are useful for measuring model performance, they do not reveal anything about the internal pro- cesses through which these models arrive at their outputs in such visio-linguistic tasks. We take inspiration from the growing literature on model interpretability to explain the behaviour of VL models on the understanding of nega- tion. Specifically, we approach these questions through an in-depth analysis of the text encoder in CLIP (Radford et al., 2021), a highly influen- tial VL model. We localise parts of the encoder that process negation and analyse the role of at- tention heads in this task. Our contributions are threefold. We demonstrate how methods from the language model interpretability literature (such as causal tracing) can be translated to mul- timodal models and tasks; we provide concrete insights into how CLIP processes negation on the VALSE existence task; and we highlight inherent limitations in the VALSE dataset as a benchmark for linguistic understanding.

Keywords

Language and Linguistics, Computer Science Applications, Software, Ophthalmology, Linguistics and Language

Citation

Quantmeyer, V, Mosteiro Romero, P & Gatt, A 2024, How and where does CLIP process negation? in ALVR 2024. Association for Computational Linguistics, pp. 59-72, Advances in Language and Vision Research (ALVR), Bangkok, Thailand, 16/08/24. < https://aclanthology.org/2024.alvr-1.5 >, workshop

URI

https://dspace.library.uu.nl/handle/1874/482164

How and where does CLIP process negation?

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI