Data augmentation by separating identity and emotion representations for emotional gait recognition

Weijie Sheng; Xiaoyan Lu; Xinde Li

doi:10.1017/S0263574722001813

Data augmentation by separating identity and emotion representations for emotional gait recognition

Published online by Cambridge University Press: 06 February 2023

Weijie Sheng

Xiaoyan Lu and

Xinde Li

Show author details

Weijie Sheng: Affiliation:
Yangzhou Collaborative Innovation Research Institute Co., Ltd., Institute of Shenyang Aircraft Design and Research, Yangzhou, 225000, China Key Laboratory of Measurement and Control of CSE Ministry of Education, School of Automation, Southeast University, Nanjing, China
Xiaoyan Lu: Affiliation:
School of Cyber Science and Engineering, Southeast University, Nanjing, China
Xinde Li*: Affiliation:
Key Laboratory of Measurement and Control of CSE Ministry of Education, School of Automation, Southeast University, Nanjing, China School of Cyber Science and Engineering, Southeast University, Nanjing, China
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Human-centered intelligent human–robot interaction can transcend the traditional keyboard and mouse and have the capacity to understand human communicative intentions by actively mining implicit human clues (e.g., identity information and emotional information) to meet individuals’ needs. Gait is a unique biometric feature that can provide reliable information to recognize emotions even when viewed from a distance. However, the insufficient amount and diversity of training data annotated with emotions severely hinder the application of gait emotion recognition. In this paper, we propose an adversarial learning framework for emotional gait dataset augmentation, with which a two-stage model can be trained to generate a number of synthetic emotional samples by separating identity and emotion representations from gait trajectories. To our knowledge, this is the first work to realize the mutual transformation between natural gait and emotional gait. Experimental results reveal that the synthetic gait samples generated by the proposed networks are rich in emotional information. As a result, the emotion classifier trained on the augmented dataset is competitive with state-of-the-art gait emotion recognition works.

Keywords

human–robot interaction data augmentation gait recognition emotion recognition

Type: Research Article
Information: Robotica , Volume 41 , Issue 5 , May 2023 , pp. 1452 - 1465

DOI: https://doi.org/10.1017/S0263574722001813 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Teijeiro-Mosquera, L., Biel, J.-I., Alba-Castro, J. L. and Gatica-Perez, D., “What your face vlogs about: expressions of emotion and big-five traits impressions in youtube,” IEEE Trans. Affect. Comput. 6(2), 193–205 (2015).CrossRef Google Scholar

Korayem, M., Azargoshasb, S., Korayem, A. and Tabibian, S., “Design and implementation of the voice command recognition and the sound source localization system for human–robot interaction,” Robotica 39(10), 1779–1790 (2021).CrossRef Google Scholar

Liu, N., Zhou, T., Ji, Y., Zhao, Z. and Wan, L., “Synthesizing talking faces from text and audio: an autoencoder and sequence-to-sequence convolutional neural network,” Pattern Recognit. 102, 107231 (2020).Google Scholar

Yun, S.-S., “A gaze control of socially interactive robots in multiple-person interaction,” Robotica 35(11), 2122–2138 (2017).CrossRef Google Scholar

Liu, X., Khan, K. N., Farooq, Q., Hao, Y. and Arshad, M. S., “Obstacle avoidance through gesture recognition: Business advancement potential in robot navigation socio-technology,” Robotica 37(10), 1663–1676 (2019).CrossRef Google Scholar

Xue, P., Li, B., Wang, N. and Zhu, T., “Emotion Recognition From Human Gait Features Based on DCT Transform,” In: 5th International Conference on Human Centered Computing (HCC), vol. 11956 (2019) pp. 511–517.Google Scholar

Göngör, F. and Tutsoy, Ö., “Design and implementation of a facial character analysis algorithm for humanoid robots,” Robotica 37(11), 1850–1866 (2019).CrossRef Google Scholar

Jain, R., Semwal, V. B. and Kaushik, P., “Stride segmentation of inertial sensor data using statistical methods for different walking activities,” Robotica, 1–14 (2021).Google Scholar

Cutting, J. E. and Kozlowski, L. T., “Recognizing friends by their walk: Gait perception without familiarity cues,” Bull. Psychon. Soc. 9(5), 353–356 (1977).CrossRef Google Scholar

Sheng, W. and Li, X., “Multi-task learning for gait-based identity recognition and emotion recognition using attention enhanced temporal graph convolutional network,” Pattern Recognit. 114(1), 107868 (2021).CrossRef Google Scholar

Li, Z., Ren, Z., Zhao, K., Deng, C. and Feng, Y., “Human-cooperative control design of a walking exoskeleton for body weight support,” IEEE Trans. Ind. Inform. 16(5), 2985–2996 (2019).CrossRef Google Scholar

Li, Z., Xu, C., Wei, Q., Shi, C. and Su, C.-Y., “Human-inspired control of dual-arm exoskeleton robots with force and impedance adaptation,” IEEE Trans. Syst. Man Cybernet. Syst. 50(12), 5296–5305 (2018).CrossRef Google Scholar

Narayanan, V., Manoghar, B. M., Dorbala, V. S., Manocha, D. and Bera, A., “Proxemo: Gait-Based Emotion Learning and Multi-View Proxemic Fusion for Socially-Aware Robot Navigation,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2020) pp. 8200–8207.CrossRef Google Scholar

Xu, S., Fang, J., Hu, X., Ngai, E., Guo, Y., Leung, V., Cheng, J. and Hu, B., “Emotion recognition from gait analyses: Current research and future directions, arXiv preprint arXiv:2003.11461 (2020).Google Scholar

Bhattacharya, U., Rewkowski, N., Guhan, P., Williams, N. L., Mittal, T., Bera, A. and Manocha, D., “Generating Emotive Gaits for Virtual Agents Using Affect-Based Autoregression,” In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), (IEEE, 2020b) pp. 24–35.CrossRef Google Scholar

Li, G., Li, Z. and Kan, Z., “Assimilation control of a robotic exoskeleton for physical human-robot interaction,” IEEE Robot. Automat. Lett. 7(2), 2977–2984 (2022).CrossRef Google Scholar

Peri, R., Parthasarathy, S., Bradshaw, C. and Sundaram, S., “Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup,” In: ICASSP, 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021) pp. 6344–6348.Google Scholar

Liang, J., Liu, Z., Zhou, J., Jiang, X., Zhang, C. and Wang, F., “Model-protected multi-task learning,” In: IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1002–1019 (2020).Google Scholar

Zhang, B., Provost, E. M. and Essl, G., “Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences,” IEEE Trans. Affect. Comput. 10(1), 85–99 (2019).CrossRef Google Scholar

Yu, X., Xu, C., Zhang, X. and Ou, L., “Real-time multitask multihuman–robot interaction based on context awareness,” Robotica 40(9), 1–27 (2022).Google Scholar

Sheng, W. and Li, X., “Siamese denoising autoencoders for joints trajectories reconstruction and robust gait recognition,” Neurocomputing 395, 86–94 (2020).CrossRef Google Scholar

Yi, L. and Mak, M.-W., “Improving speech emotion recognition with adversarial data augmentation network,” IEEE Trans. Neur. Netw. Learn. 33(1), 172–184 (2020).Google Scholar

Huang, C.-L., “Exploring Effective Data Augmentation with Tdnn-Lstm Neural Network Embedding for Speaker Recognition,” In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2019) pp. 291–295.CrossRef Google Scholar

Bhattacharya, U., Mittal, T., Chandra, R., Randhavane, T., Bera, A. and Manocha, D., “Step: Spatial Temporal Graph Convolutional Networks for Emotion Perception From Gaits,” In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34a (2020a) pp. 1342–1350.CrossRef Google Scholar

Mirza, M. and Osindero, S.. Conditional Generative Adversarial Nets, arXiv: Learning (2014).Google Scholar

Sohn, K., Lee, H. and Yan, X., “Learning Structured Output Representation Using Deep Conditional Generative Models,” In: NIPS 2015 (2015) pp. 3483–3491.Google Scholar

Gao, J., Chakraborty, D., Tembine, H. and Olaleye, O., “Nonparallel Emotional Speech Conversion,” In: Interspeech (2019).Google Scholar

Isola, P., Zhu, J.-Y., Zhou, T. and Efros, A. A., “Image-to-Image Translation with Conditional Adversarial Networks,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 1125–1134.Google Scholar

Zhu, J.-Y., Park, T., Isola, P. and Efros, A. A., “Unpaired Image-to-image Translation Using Cycle-Consistent Adversarial Networks,” In: IEEE International Conference on Computer Vision (ICCV) (2017) pp. 2242–2251.Google Scholar

Kim, T., Cha, M., Kim, H., Lee, J. K. and Kim, J., “Learning to Discover Cross-Domain Relations with Generative Adversarial Networks,” In: International Conference on Machine Learning (PMLR, 2017), pp. 1857–1865.Google Scholar

Huang, X., Liu, M.-Y., Belongie, S. and Kautz, J., “Multimodal Unsupervised Image-to-image Translation,” In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) pp. 172–189.Google Scholar

Choi, Y., Uh, Y., Yoo, J. and Ha, J.-W., “Stargan v2: Diverse Image Synthesis for Multiple Domains,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) pp. 8188–8197.Google Scholar

Rizos, G., Baird, A., Elliott, M. and Schuller, B., “Stargan for Emotional Speech Conversion: Validated by Data Augmentation of End-to-end Emotion Recognition,” In: ICASSP, 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020) pp. 3502–3506.Google Scholar

Su, B.-H. and Lee, C.-C., “A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition,” In: IEEE Spoken Language Technology Workshop (SLT) (2021) pp. 351–357.CrossRef Google Scholar

Zhu, Q., Gao, L., Song, H. and Mao, Q., “Learning to disentangle emotion factors for facial expression recognition in the wild,” Int. J. Intell. Syst. 36(6), 2511–2527 (2021).CrossRef Google Scholar

Schroff, F., Kalenichenko, D. and Philbin, J., “Facenet: A Unified Embedding for Face Recognition and Clustering,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) pp. 815–823.Google Scholar

Kastaniotis, D., Theodorakopoulos, I., Theoharatos, C., Economou, G. and Fotopoulos, S., “A framework for gait-based recognition using kinect,” Pattern Recogn. Lett. 68, 327–335 (2015).CrossRef Google Scholar

Kastaniotis, D., Theodorakopoulos, I., Economou, G. and Fotopoulos, S., “Gait based recognition via fusing information from euclidean and riemannian manifolds,” Pattern Recogn. Lett. 84, 245–251 (2016).CrossRef Google Scholar

Bao, J., Chen, D., Wen, F., Li, H. and Hua, G., “CVAE-GAN: Fine-grained Image Generation Through Asymmetric Training,” In: IEEE International Conference on Computer Vision (ICCV) (2017) pp. 2764–2773.Google Scholar

Article contents

Data augmentation by separating identity and emotion representations for emotional gait recognition

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests