Computational Framework for Emotional Motor Signatures
This work involves experimental, modelling and data mining activities. It aims to better understand how our human emotions take shape and are reflected in our sensory-motor behaviour.
The identification of the emotional signatures of the movements requires the calculation of a compact vector representation of a given movement from the movement parameters (positions by motion capture for example). This compact representation allows, among other things, the calculation of similarity or divergence measures between movements on axes related to emotional valence.
The proposed approach begins by computing the instantaneous velocity gradients of the motions, forming the foundation for our examination of their similarities. In pursuit of this objective, we will employ the Wasserstein metric. However, the variations in input characteristics give rise to distinct surface profiles for the gradient functions associated with each motion. Consequently, the prerequisite conditions for the accurate application of the Wasserstein metric cannot be fulfilled. To circumvent this challenge, we will adopt an alternative strategy that involves utilizing the gradient distribution to characterize each motion. This approach enables us to determine the distances between motions through the metric, thereby generating a square distance matrix.
After the Multidimensional Scaling (MDS) technique comes into play, facilitating the creation of a 2D representation that visualizes the interrelationships among the motions. Subsequently, this visual representation serves as a foundation for the application of clustering methodologies, specifically the K-Means algorithm. By employing this approach, we aim to delineate clusters of similar motions based on their gradient distribution characteristics, enabling a comprehensive exploration of the underlying patterns and relationships.
The first objective is to apply the classical method to several state-of-the-art datasets, both on individuals and groups of individuals, having previously applied a pose extraction method. The second task is to propose methodological evaluations for the computation of these signatures allowing the integration of multimodal information, using deep learning, in particular architectures of the variational autoencoder (VAEs) type, which are increasingly exploited to capture interpretable multimodal representations and to close the cognitive principles underlying the perception of similarity. The key idea is to train the VAE to effectively encode body movement sequences into a compact and expressive latent space representation. By learning the underlying structure and patterns of body movements associated with different emotions, the VAE can be utilized for emotion recognition.
The VAE's training objective involves maximizing the Evidence Lower Bound (ELBO), which consists of two components: the reconstruction loss and the regularization term. The reconstruction loss measures the discrepancy between the original input mocap data and the reconstructed data generated by the decoder. The regularization term, typically based on the Kullback-Leibler (KL) divergence, encourages the latent space distribution to match a prior distribution.
Variational Autoencoders (VAEs) have the potential to generate new body movement data corresponding to specific emotional states. By leveraging the learned latent representation of emotions, VAEs can facilitate the synthesis of new body movement sequences that exhibit desired emotional characteristics.
Sampling from the latent space distribution makes possible to generate body movement sequences that reflect desired emotions.
This experience was extremely interesting and instructive. Not only did I acquire new knowledge and good practices on a technical level, but I also got a clearer idea about the work of research. Before, I was only focused on the learning aspect and my desire to quench my thirst for knowledge, now I realize that this is not enough to have good results. Another factor that comes into play is perseverance and determination. The research work and the investment of a long time to study a potential solution to a problem can give unsatisfactory results, and for a novice like me, the first thing I felt was frustration, before realizing that these results were not a failure, but a track to be improved and modified, or in the worst case a track discarded on the way of this research.
Another way to explore could be to increase the components of the dimension reduction result, maybe that would allow us to better discern the nuances between the different emotions. To favor a good classification at the risk of not being able to represent the dimensions on a graph. Changing the database could improve the results too.
After rigorously experimenting with classical emotion recognition methods across various datasets, it becomes increasingly evident that these traditional approaches have their limitations. Classical methods often rely on rule-based or predefined feature extraction techniques, which struggle to adapt to the diverse and nuanced expressions of emotions found in real-world data. These methods tend to perform inconsistently across different datasets and fail to capture the underlying emotional dynamics effectively. This inconsistency underscores the need for more advanced techniques like Variational Autoencoders (VAEs).
In contrast, VAEs have emerged as a promising solution in the realm of emotion recognition and generation. By learning a latent representation of emotions, VAEs effectively extract an "emotional motor signature" from the data. This signature not only improves the adaptability and generalization of emotion-related tasks but also provides a deeper understanding of the emotional content present in the text or other data types. This conclusion underscores the importance of embracing innovative approaches like VAEs in emotion-related tasks, as they offer the potential to capture the richness and complexity of human emotions more effectively than classical methods.