Multi-stream CNN: Learning representations based on human-related regions for action recognition

Tu, Zhigang; Xie, Wei; Qin , Qianqing; Poppe, R.W.; Veltkamp, R.C.; Li, Baoxin; Yuan, Junsong

doi:https://doi.org/10.1016/j.patcog.2018.01.020

Multi-stream CNN: Learning representations based on human-related regions for action recognition

Publication date

2018

Authors

Tu, Zhigang

Xie, Wei

Qin , Qianqing

Poppe, R.W.

Veltkamp, R.C.

Li, Baoxin

Yuan, Junsong

DOI

https://doi.org/10.1016/j.patcog.2018.01.020

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

No license information available

Abstract

The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We additionally consider human-related regions that contain the most informative features. First, by improving foreground detection, the region of interest corresponding to the appearance and the motion of an actor can be detected robustly under realistic circumstances. Based on the entire detected human body, we construct one appearance and one motion stream. In addition, we select a secondary region that contains the major moving part of an actor based on motion saliency. By combining the traditional streams with the novel human-related streams, we introduce a human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human-related regions. Comparative evaluation on the JHMDB, HMDB51, UCF Sports and UCF101 datasets demonstrates that the streams contain features that complement each other. The proposed multi-stream architecture achieves state-of-the-art results on these four datasets.

Keywords

Convolutional Neural Network, action recognition, multi-stream, motion salient region, Taverne

Citation

Tu, Z, Xie, W, Qin , Q, Poppe, R W, Veltkamp, R C, Li, B & Yuan, J 2018, 'Multi-stream CNN: Learning representations based on human-related regions for action recognition', Pattern Recognition, vol. 79, pp. 32–43. https://doi.org/10.1016/j.patcog.2018.01.020

URI

https://dspace.library.uu.nl/handle/1874/362165

Multi-stream CNN: Learning representations based on human-related regions for action recognition

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI