Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos

Publication date

2018

Authors

Kapidis, G.ISNI 0000000523924174
Poppe, RonaldISNI 0000000389426288
van Dam, Elsbeth
Veltkamp, RemcoISNI 0000000109665680
Noldus, Lucas

Editors

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

taverne

Abstract

Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e. the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information by using LSTM. Finally, we argue about the potential for useful real-world applications.

Keywords

Taverne

Citation

Kapidis, G, Poppe, R W, van Dam, E, Veltkamp, R C & Noldus, L 2018, Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos. in Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom) Workshops. IEEE, pp. 878-883. https://doi.org/10.1109/PERCOMW.2018.8480258