Jingyao Wu
Final year PhD student
UNSW Sydney
Jingyao WU received the BE (Hons) degree in engineering from the University of New South Wales (UNSW), Sydney, Australia in 2020, where she is currently working towards the PhD degree in signal processing. Her research interests include emotion recognition, multimodal signal processing and machine learning. She is a Student Member of the IEEE and SPS.
Speech Emotion Recognition: An investigation of Emotion Ambiguity and Dynamics
The recognition and comprehension of the emotional state of individuals are widely recognized as fundamental aspects in achieving sophisticated human-computer interaction. The representation and annotation of emotions encompass a range of methodologies, posing significant challenges within the field of affective computing. This is particularly crucial as contemporary emotion recognition systems strive to emulate human evaluations, namely through the application of emotion labels or annotations.
The process of labelling emotions involves multiple annotators, which can introduce ambiguity into the prediction system. The divergent perspectives on emotional perception among these raters are commonly referred to as inter-rater ambiguity. In the realm of research, a prevalent approach to addressing inter-rater ambiguity is its elimination, treating it solely as undesirable noise. Nevertheless, it is worth noting that inter-rater ambiguity also contains valuable information pertaining to emotions and serves as a valuable asset for complex emotion modelling. Consequently, incorporating the assessment ambiguity from multiple raters may contribute significantly to the advancement of emotion systems, facilitating comprehensive emotion prediction that accurately reflects variations in human perception.
Furthermore, psychologists posit that emotions exhibit emergent properties and necessitate a dynamic modelling framework. However, the dynamic nature of emotions remains relatively unexplored within the current field. Consequently, my research focuses on investigating a speech-based emotion recognition system that effectively models the time-varying emotional states. This system integrates information from multiple raters, thereby capturing the inherent ambiguity surrounding emotions.