The EASR corpora of European Portuguese, French, hungarian and polish elderly speech

Annika Hämäläinen; Jairo Avelar; Silvia Rodrigues; Miguel Sales Dias; Artur Kolesiski; Tibor Fegyó; Géza Németh; Petra Csobánka; Karine Lan Hing Ting; David Hewson

Ciência_Iscte Publications Publication Detailed Description

Publication in conference proceedings

The EASR corpora of European Portuguese, French, hungarian and polish elderly speech

Annika Hämäläinen (Hämäläinen, A.); Jairo Avelar (Avelar, J.); Silvia Rodrigues (Rodrigues, S.); Miguel Sales Dias (Dias, J.); Artur Kolesiski (Kolesinski, A.); Tibor Fegyó (Fegyó, T.); Géza Németh (Németh, G.); Petra Csobánka (Csobánka, P.); Karine Lan Hing Ting (Ting, K. L. H.); David Hewson (Hewson, D.); et al.

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

Year (definitive publication)

2014

Language

English

Country

France

More Information

Visit Link

Web of Science®

Times Cited: 6

(Last checked: 2026-06-09 15:50)

View record in Web of Science®

Scopus

Times Cited: 10

(Last checked: 2026-06-09 20:12)

View record in Scopus

Google Scholar

Times Cited: 1

(Last checked: 2026-06-02 16:06)

View record in Google Scholar

Overton

This publication is not indexed in Overton

Abstract

Currently available speech recognisers do not usually work well with elderly speech. This is because several characteristics of speech (e.g. fundamental frequency, jitter, shimmer and harmonic noise ratio) change with age and because the acoustic models used by speech recognisers are typically trained with speech collected from younger adults only. To develop speech-driven applications capable of successfully recognising elderly speech, this type of speech data is needed for training acoustic models from scratch or for adapting acoustic models trained with younger adults’ speech. However, the availability of suitable elderly speech corpora is still very limited. This paper describes an ongoing project to design, collect, transcribe and annotate large elderly speech corpora for four European languages: Portuguese, French, Hungarian and Polish. The Portuguese, French and Polish corpora contain read speech only, whereas the Hungarian corpus also contains spontaneous command and control type of speech. Depending on the language in question, the corpora contain 76 to 205 hours of speech collected from 328 to 986 speakers aged 60 and over. The final corpora will come with manually verified orthographic transcriptions, as well as annotations for filled pauses, noises and damaged words.

Acknowledgements

Keywords

Automatic speech recognition,Corpus,Elderly speech

Fields of Science and Technology Classification

Computer and Information Sciences - Natural Sciences
Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
Languages and Literature - Humanities

Funding Records

Funding Reference	Funding Entity
AAL2009-2-068	Fundação para a Ciência e a Tecnologia

Contributions to the Sustainable Development Goals of the United Nations

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência_Iscte. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.

Publication Identifiers

Scopus (source: Ciência_Iscte)	2-s2.0-84977583701
Other ID (source: External)	cv-prod-id-1809861
Scopus (source: External)	2-s2.0-84977583701
WoS (source: Ciência_Iscte)	WOS:000355611003011
WoS (source: author)	WOS:000355611003011
WoS (source: External)	000355611003011
Scopus (source: author)	2-s2.0-84977583701
Ciência_Iscte ID	ci-pub-96270
ISBN (source: External)	978-2-9517408-8-4
Handle (source: Ciência-IUL)	http://hdl.handle.net/10071/25544

Other Publication Details

Online Publication Year	2014
Publisher	European Language Resources Association (ELRA)
Indexes	Web of Science©; Scopus; ERIH; IBSS; Scielo;
ISSN	--
ISBN	978-2-9517408-8-4 (print) 978-2-9517408-8-4 (online)
Volume
Article Number
Pages	1458 - 1464	Total Pages	7
Peer Reviewed	Yes
Dissemination Mean	Both (printed and digital)
Editors	Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Event Title	9th International Conference on Language Resources and Evaluation, LREC 2014
Event Organizer	European Language Resources Association (ELRA)
City	Reykjavik
Event Type	Conference
Event Classification	International
Event Year	2014
Event Publication Type	Full Paper
ISCTE-IUL Repository	Link to the repository
Publication Date (online)
Publication Date (print)

Altmetric

PlumX Metrics