Quietly angry, loudly happy
Self-reported customer satisfaction vs. automatically detected emotion in contact center calls
Phone calls are an essential communication channel in today’s contact centers, but they are more difficult to
analyze than written or form-based interactions. To that end, companies have traditionally used surveys to gather feedback and
gauge customer satisfaction. In this work, we study the relationship between self-reported customer satisfaction (CSAT) and
automatic utterance-level indicators of emotion produced by affect recognition models, using a real dataset of contact center
calls. We find (1) that positive valence is associated with higher CSAT scores, while the presence of anger is associated with
lower CSAT scores; (2) that automatically detected affective events and CSAT response rate are linked, with calls containing
anger/positive valence exhibiting respectively a lower/higher response rate; (3) that the dynamics of detected emotions are linked
with both CSAT scores and response rate, and that emotions detected at the end of the call have a greater weight in the
relationship. These findings highlight a selection bias in self-reported CSAT leading respectively to an over/under-representation
of positive/negative affect.
Article outline
- 1.Introduction
- 2.Related work
- 2.1Emotion recognition in phone calls
- 2.2Prediction of customer satisfaction and/or service quality
- 2.3Joint analysis of emotion and customer satisfaction
- 3.Hypotheses
- 4.Materials
- 4.1Database
- 4.2Customer satisfaction
- 4.3Affective indicators
- 4.4Dynamics of affective indicators and customers’ profiles
- 5.Methods
- 5.1Automatic speech recognition
- 5.2Automatic emotion recognition
- 5.2.1Data and labels
- 5.2.2Emotion recognition model
- 5.3Data analysis
- 6.Results
- 6.1CSAT scoring in phone calls
- 6.2Emotion recognition
- 6.3CSAT response rate and emotions
- 6.4Satisfaction score and emotions
- 6.5CSAT response rate and emotional dynamics profiles
- 6.6Self-reported satisfaction and emotional dynamics profiles
- 7.Discussion
- 8.Conclusion and future works
- Note
-
References
References (26)
References
Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., and Aono, Y. (2017). Hierarchical
LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center
Calls. In Proc. Interspeech
2017, pages 1716–1720.
Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y., and Toda, T. (2020). Customer
satisfaction estimation in contact center calls based on a hierarchical multi-task
model. IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 281:715–728.
Auguste, J., Charlet, D., Damnati, G., Bechet, F., and Favre, B. (2019). Can
we predict self-reported customer satisfaction from
interactions? In ICASSP 2019 – 2019 IEEE International Conference on
Acoustics, Speech and Signal Processing
(ICASSP), pages 7385–7389.
Bockhorst, J., Yu, S., Polania, L., and Fung, G. (2017). Predicting
self-reported customer satisfaction of interactions with a corporate call
center. In Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Zitnik, M., Ceci, M., and Dzeroski, S., editors, Machine
Learning and Knowledge Discovery in
Databases, pages 179–190, Cham. Springer International Publishing. ISBN 978-3-319-71273-4.
Chowdhury, S. A., Stepanov, E. A., and Riccardi, G. (2016). Predicting
user satisfaction from turn-taking in spoken
conversations. In INTERSPEECH.
Deschamps-Berger, T., Lamel, L., and Devillers, L. (Sept. 2021). End-to-End
Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data
Recordings. In 2021 9th International Conference on Affective
Computing and Intelligent Interaction (ACII), Nara, Japan.
Erden, M. and Arslan, L. M. (2011). Automatic
detection of anger in human-human call center dialogs. In Proc.
Interspeech 2011, pages 81–84.
Eyben, F., Weninger, F., Gross, F., and Schuller, B. (2013). Recent
developments in opensmile, the munich open-source multimedia feature
extractor. In Proceedings of the 21st ACM International Conference on
Multimedia, MM
’131, page 835–838, New York, NY, USA. Association for Computing Machinery. ISBN 9781450324045.
Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., and Riviello, M.-T. (2013). Classification
of emotional speech units in call centre interactions. In 2013 IEEE
4th International Conference on Cognitive Infocommunications
(CogInfoCom), pages 403–406.
Graves, A., Fernandez, S., Gomez, F., and Schmidhuber, J. (2006). Connectionist
temporal classification: Labelling unsegmented sequence data with recurrent neural
networks. In Proceedings of the 23rd International Conference on
Machine
Learning, ICML ’061, page 369–376, New York, NY, USA. Association for Computing Machinery. ISBN 1595933832.
Kim, Y., Levy, J., and Liu, Y. (2020). Speech
sentiment and customer satisfaction estimation in socialbot
conversations. In INTERSPEECH.
Luque, J., Segura, C., Sánchez, A., Umbert, M., and Galindo, L. A. (2017). The
Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone
Calls. In Proc. Interspeech
2017, pages 2346–2350.
Miao, Y., Gowayyed, M., and Metze, F. (2015). Eesen:
End-to-end speech recognition using deep rnn models and wfst-based
decoding. In 2015 IEEE Workshop on Automatic Speech Recognition and
Understanding
(ASRU), pages 167–174.
Morrison, D., Wang, R., and De Silva, L. C. (2007). Ensemble methods for
spoken emotion recognition in call-centres. Speech
Communication, 49(2):98–112. ISSN 0167-6393.
Mower, E., Matarić, M. J., and Narayanan, S. (2011). A
framework for automatic human emotion classification using emotion profiles. IEEE Transactions
on Audio, Speech, and Language
Processing, 19(5):1057–1070.
Petrushin, V. A. (1999). Emotion
in speech: Recognition and application to call centers.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., and Vesely, K. The
kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic
Speech Recognition and Understanding. IEEE Signal Processing
Society, Dec. 2011. IEEE Catalog No.:
CFP11SRW-USB.
Russell, J. A. (1980). A
circumplex model of affect. Journal of personality and social
psychology, 39(6):1161.
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C. A., and Narayanan, S. S. (2010a). The
interspeech 2010 paralinguistic
challenge. In INTERSPEECH.
Schuller, B., Weninger, F., Zhang, Y., Ringeval, F., Batliner, A., Steidl, S., Eyben, F., Marchi, E., Vinciarelli, A., Scherer, K., Chetouani, M., and Mortillaro, M. (2019). Affective
and behavioural computing: Lessons learnt from the first computational paralinguistics
challenge. Computer Speech &
Language, 531:156–180. ISSN 0885-2308.
Schuller, B. W., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C. A., and Narayanan, S. S. (2010b). The
INTERSPEECH 2010 paralinguistic challenge. In INTERSPEECH 2010, 11th
Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26–30,
2010, pages 2794–2797.
Schuller, B. W., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K. R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., and Kim, S. (2013). The
INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion,
autism. In INTERSPEECH 2013, 14th Annual Conference of the
International Speech Communication Association, Lyon, France, August 25–29,
2013, pages 148–152.
Segura, C., Balcells, D., Umbert, M., Arias, J., and Luque, J. (2016). Automatic
speech feature learning for continuous prediction of customer satisfaction in contact center phone
calls. In Abad, A., Ortega, A., Teixeira, A., Mateo, C. Garcia, Hinarejos, C. D. Martínez, Perdigão, F., Batista, F., and Mamede, N., editors, Advances
in Speech and Language Technologies for Iberian
Languages, pages 255–265, Cham. Springer International Publishing. ISBN 978-3-319-49169-1.
Vaudable, C. and Devillers, L. (2012). Negative
emotions detection as an indicator of dialogs quality in call
centers. In 2012 IEEE International Conference on Acoustics, Speech
and Signal Processing
(ICASSP), pages 5109–5112.
Viikki, O. and Laurila, K. (1998). Cepstral
domain segmental feature vector normalization for noise robust speech recognition. Speech
Communication, 25(1):133–147. ISSN 0167-6393.
Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., and Kingsbury, B. (2006). Automated
quality monitoring for call centers using speech and nlp
technologies. In Proceedings of the 2006 Conference of the North
American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume:
Demonstrations, NAACL-Demonstrations ’061, page 292–295, USA. Association for Computational Linguistics.