Publication in conference proceedings
Casa de la Lhéngua: A set of language resources and natural language processing tools for Mirandese
Ferreira, J.P. (Ferreira, J.); Chesi, C. (Chesi, C.); Daan Baldewijns (Baldewijns, D.); Miguel Sales Dias (Dias, J.); Daniela Braga (Braga, D.); Pinto, F.M. (Pinto, F); Hyongsil Cho (Cho, H.); Margarita Correia (Correia, M.); Amadeu Ferreira (Amadeu, F.); et al.
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)
Year (definitive publication)
2014
Language
English
Country
Luxembourg
More Information
Web of Science®

This publication is not indexed in Web of Science®

Scopus

This publication is not indexed in Scopus

Google Scholar

Times Cited: 0

(Last checked: 2024-11-17 09:28)

View record in Google Scholar

Abstract
This paper describes the efforts for the construction of Language Resources and NLP tools for Mirandese, a minority language spoken in North-eastern Portugal, now available on a community-led portal, Casa de la Lhéngua. The resources were developed in the context of a collaborative citizenship project led by Microsoft, in the context of the creation of the first TTS system for Mirandese. Development efforts encompassed the compilation of a corpus with over 1M tokens, the construction of a GTP system, syllable-division, inflection and a Part-of-Speech (POS) tagger modules, leading to the creation of an inflected lexicon of about 200.000 entries with phonetic transcription, detailed POS tagging, syllable division, and stress mark-up. Alongside these tasks, which were made easier through the adaptation and reuse of existing tools for closely related languages, a casting for voice talents among the speaking community was conducted and the first speech database for speech synthesis was recorded for Mirandese. These resources were combined to fulfil the requirements of a well-tested statistical parameter synthesis model, leading to an intelligible voice font. These language resources are available freely at Casa de la Lhéngua, aiming at promoting the development of real-life applications and fostering linguistic research on Mirandese.
Acknowledgements
--
Keywords
Language resources,Minority language,Mirandese,Speech synthesis,Lexical database
  • Computer and Information Sciences - Natural Sciences
  • Electrical Engineering, Electronic Engineering, Information Engineering - Engineering and Technology
  • Languages and Literature - Humanities

With the objective to increase the research activity directed towards the achievement of the United Nations 2030 Sustainable Development Goals, the possibility of associating scientific publications with the Sustainable Development Goals is now available in Ciência-IUL. These are the Sustainable Development Goals identified by the author(s) for this publication. For more detailed information on the Sustainable Development Goals, click here.