Face2Text revisited: Improved data set and baseline results

Tanti, M; Abdilla, S; Muscat, A; Borg, C; Farrugia, RA; Gatt, A

Face2Text revisited: Improved data set and baseline results

Files

2022.pvlam_1.6.pdf (1.01 MB)

Publication date

2022

Authors

Tanti, M

Abdilla, S

Muscat, A

Borg, C

Farrugia, RA

Gatt, Albert

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by_nc

Abstract

Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description generator trained on it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM + Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set and these experimental results provide data and baselines for future work in this area.

Citation

Tanti, M, Abdilla, S, Muscat, A, Borg, C, Farrugia, RA & Gatt, A 2022, Face2Text revisited: Improved data set and baseline results. in Proceedings of the Second Workshop on People in Vision, Language and Mind @ LREC2022. European Language Resources Association (ELRA), pp. 41-47. < https://aclanthology.org/2022.pvlam-1.6 >

URI

https://dspace.library.uu.nl/handle/1874/425950

Face2Text revisited: Improved data set and baseline results

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI