From voice to emotion recognition: how WorkingAge tool will work

workingage30 April, 2020BLOG

Text and images by Roberto Tedesco, Researcher at POLIMI, Sara Comai, Associate Professor at POLIMI, and Hesam Sagha, Senior R&D Developer at audEERING.

The WA Tool will integrate a module for speech analysis aiming at voice emotion recognition, useful to understand if the worker is happy, stressed, sad, angry or simply neutral. Moreover, a noise analysis module will permit to measure how good is the sound “landscape” of the working environment, which is related to stress and mental strain.

This module is composed by a Bluetooth microphone, worn by the worker and connected to a Raspberry PI 4, which also contains an internal microphone used to perceive and measure the environmental noise.

The Raspberry is in charge of performing Noise Detection and Voice Activity Detection (VAD). The aim of the latter is to identify only the parts of the audio containing voice, which needs to be analysed and therefore to be sent to the module devoted to emotion recognition. Figure 1 shows the box containing the Raspberry PI 4.

*Figure 1 – Raspberry PI 4 for noise analysis and VAD*

On the other side, Figure 2 shows the architecture of the module. The Raspberry PI 4 is connected via WiFi to the Edge Cloud of WorkingAge, which contains the modules developed by Politecnico di Milano and audEERING for emotion recognition, supported by a locally-installed (due to privacy concerns), commercial ASR.

*Figure 2 – Architecture for voice/noise analysis*

The output of the emotion recognition done on the Edge Cloud and the output of the Noise Detection executed on the Raspberry PI 4 will be sent to the WA Application.

Emotion recognition is performed by two independent components: Politecnico di Milano provides a tool based on Deep Learning, which leverages both acoustic and textual features; while the audEERING tool is based on the “VocEmoApi” patented technology. A Fusion block leverages the output of both components to generate the final emotion label.

*Figure 3 – Setting; note the Raspberry PI 4 and the Jabra Bluetooth microphone*

From the user point of view, he/she simply wears a microphone. In our pilot studies we will use the Jabra 65e microphone, which is very comfortable, besides having a nice design. It has to be worn around the neck as shown in Figure 3. It has three microphones (one on each of the earbuds and one on the cable) and uses Active Noise Cancellation to reduce outside noise.

Recent Posts