Snakes and Ladders: Two Steps Up for VideoMamba

Lu, Hui; Salah, Albert; Poppe, Ronald

Snakes and Ladders: Two Steps Up for VideoMamba

Files

Lu_Snakes_and_Ladders_Two_Steps_Up_for_VideoMamba_ICCV_... (1.13 MB)

Publication date

2025

Authors

Lu, Hui

Salah, Albert Ali

Poppe, Ronald

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

Abstract

Video understanding requires the extraction of rich spatio-temporal representations, achieved by transformer models through self-attention. Unfortunately, self-attention poses a computational burden. In NLP, Mamba has surfaced as an efficient alternative for transformers. However, Mamba's successes do not trivially extend to vision tasks, including those in video analysis. In this paper, we theoretically analyze the differences between self-attention and Mamba. We identify two limitations in Mamba's token processing: historical decay and element contradiction. We propose VideoMambaPro (VMP) that addresses these limitations by adding masked backward computation and elemental residual connections to a VideoMamba backbone. VideoMambaPro models surpass VideoMamba by 1.6-3.0% and 1.1-1.9% top-1 on Kinetics-400 and Something-Something V2, respectively. Even without extensive pre-training, our models present an attractive and efficient alternative to current transformer models. Moreover, our two solutions are orthogonal to recent advances in Vision Mamba models, and are likely to provide further improvements in future models.

Citation

Lu, H, Salah, A & Poppe, R 2025, Snakes and Ladders: Two Steps Up for VideoMamba. in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). The Computer Vision Foundation, pp. 24234-24244. < https://openaccess.thecvf.com/content/ICCV2025/html/Lu_Snakes_and_Ladders_Two_Steps_Up_for_VideoMamba_ICCV_2025_paper.html >

URI

https://dspace.library.uu.nl/handle/1874/483640

Snakes and Ladders: Two Steps Up for VideoMamba

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI