SCFormer: Integrating hybrid Features in Vision Transformers

Lu, Hui; Poppe, Ronald; Salah, Albert

doi:https://doi.org/10.1109/ICME55011.2023.00323

SCFormer: Integrating hybrid Features in Vision Transformers

Files

SCFormer_Integrating_hybrid_Features_in_Vision_Transfor... (1.08 MB)

Publication date

2023

Authors

Lu, Hui

Poppe, R.W.

Salah, A.A.

DOI

https://doi.org/10.1109/ICME55011.2023.00323

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

Abstract

Hybrid modules that combine self-attention and convolution operations can benefit from the advantages of both, and consequently achieve higher performance than either operation alone. However, current hybrid modules do not capitalize directly on the intrinsic relation between self-attention and convolution, but rather introduce external mechanisms that come with increased computation cost. In this paper, we propose a new hybrid vision transformer called Shift and Concatenate Transformer (SCFormer), which benefits from the intrinsic relationship between convolution and self-attention. SCFormer roots in the Shift and Concatenate Attention (SCA) block, that integrates convolution and self-attention features. We propose a shifting mechanism and corresponding aggregation rules for the feature integration of SCA blocks such that generated features more closely approximate the optimal output features. Extensive experiments show that, with comparable computational complexity, SCFormer consistently achieves improved results over competitive baselines on image recognition and downstream tasks. Our code is available at: https://github.com/hotfinda/SCFormer.

Keywords

Vision transformer, feature integration, hybrid module, Taverne, Computer Networks and Communications, Computer Science Applications

Citation

Lu, H, Poppe, R & Salah, A 2023, SCFormer: Integrating hybrid Features in Vision Transformers. in Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2023-July, IEEE, pp. 1883-1888. https://doi.org/10.1109/ICME55011.2023.00323

URI

https://dspace.library.uu.nl/handle/1874/431703

SCFormer: Integrating hybrid Features in Vision Transformers

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI