Publication in conference proceedings
A multimodal educational game for 3-10-year-old children: Collecting and automatically recognising European Portuguese children’s speech
Annika Hämäläinen (Hämäläinen, A.); Fernando Miguel Pinto (Pinto, F. M.); Silvia Rodrigues (Rodrigues, S.); Ana Júdice (Júdice, A.); Sandra Morgado Silva (Silva, S. M.); António Calado (Calado, A.); Miguel Sales Dias (Dias, M. S.); et al.
2013 ISCA International Workshop on Speech and Language Technology in Education (SLaTE 2013)
Year (definitive publication)
2013
Language
English
Country
United States of America
More Information
Web of Science®

This publication is not indexed in Web of Science®

Scopus

Times Cited: 7

(Last checked: 2024-05-14 10:57)

View record in Scopus

Google Scholar

Times Cited: 17

(Last checked: 2024-05-19 01:02)

View record in Google Scholar

Abstract
Speech interfaces have tremendous potential in education. In this paper, we present our work in the Contents for Next Generation Networks project, an ongoing Portuguese industry-academia collaboration developing a multimodal educational game aimed at improving the physical coordination and the basic mathematical and musical skills of 3-10- year-old children. We focus on our work in the area of children's speech recognition: designing, collecting, transcribing and annotating a 21-hour corpus of prompted European Portuguese children's speech, as well as our first experiments with different acoustic modelling approaches. Our speech recognition results suggest that training children's speech models from scratch is a more promising approach than retraining adult speech models using children's speech when a sufficient amount of training data is available from the targeted age group. This finding also holds for adult female speech models retrained using children's speech. As compared with a baseline recogniser comprising gender-dependent adult speech models, the best-performing children's speech models that we have trained so far – genderindependent cross-word triphones trained with 17.5 hours of speech from 3-10-year-old children – resulted in a 45-percent (relative) decrease in word error rate in a task expecting isolated cardinal numbers, sequences of cardinal numbers or musical notes as speech input
Acknowledgements
--
Keywords
Acoustic modelling,ASR,Child-computer interaction,Corpus,Educational game,European Portuguese
  • Computer and Information Sciences - Natural Sciences
  • Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
  • Languages and Literature - Humanities
Funding Records
Funding Reference Funding Entity
QREN 7943 CNG Comissão Europeia

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência-IUL. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.