Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

Fernando Batista; Helena Moniz; Isabel Trancoso; Nuno Mamede; Ana Isabel Mata

Ciência_Iscte Publicações Descrição Detalhada da Publicação

Artigo em revista científica

Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

Fernando Batista (Batista, F.); Helena Moniz (Moniz, H.); Isabel Trancoso (Trancoso, I.); Nuno Mamede (Mamede, N.); Ana Isabel Mata (Mata, A. I.);

Título Revista

Journal of Speech Sciences

Ano (publicação definitiva)

2012

Língua

Inglês

País

Portugal

Mais Informação

Visitar Link

Web of Science®

Esta publicação não está indexada na Web of Science®

Scopus

Esta publicação não está indexada na Scopus

Google Scholar

N.º de citações: 18

(Última verificação: 2026-06-28 17:53)

Ver o registo no Google Scholar

Overton

Esta publicação não está indexada no Overton

Abstract/Resumo

This paper describes a framework that extends automatic speech transcripts in order to accommodate relevant information coming from manual transcripts, the speech signal itself, and other resources, like lexica. The proposed framework automatically collects, relates, computes, and stores all relevant information together in a self-contained data source, making it possible to easily provide a wide range of interconnected information suitable for speech analysis, training, and evaluating a number of automatic speech processing tasks. The main goal of this framework is to integrate different linguistic and paralinguistic layers of knowledge for a more complete view of their representation and interactions in several domains and languages. The processing chain is composed of two main stages, where the first consists of integrating the relevant manual annotations in the speech recognition data, and the second consists of further enriching the previous output in order to accommodate prosodic information. The described framework has been used for the identification and analysis of structural metadata in automatic speech transcripts. Initially put to use for automatic detection of punctuation marks and for capitalization recovery from speech data, it has also been recently used for studying the characterization of disfluencies in speech. It was already applied to several domains of Portuguese corpora, and also to English and Spanish Broadcast News corpora.

Agradecimentos/Acknowledgements

Palavras-chave

Automatic speech processing,Speech alignment,Structural metadata,Speech prosody,Speech data representation,Multiple-domain speech corpora,Cross-language speech processing

Registos de financiamentos

Referência de financiamento	Entidade Financiadora
SFRH/BD/44671/2008	Fundação para a Ciência e a Tecnologia
CMU-PT/HuMach/0039/2008	Fundação para a Ciência e a Tecnologia
PTDC/CLE-LIN/120017/2010	Fundação para a Ciência e a Tecnologia
PEst-OE/EEI/LA0021/2011	Fundação para a Ciência e a Tecnologia

Identificadores da Publicação

Outro ID (fonte: ORCID)	0502160947938-35
Handle (fonte: Ciência-IUL)	http://hdl.handle.net/10071/14185
ID Ciência_Iscte	ci-pub-9070

Outros Detalhes da Publicação

Ano Publicação Online	2012
Editora	Luso-Brazilian Association of Speech Sciences
Indexação	--
ISSN	2236-9740 (print) 2236-9740 (online)
ISBN	--
Factor de Impacto	--
Volume	2	Número	2
Série
Número Artigo	--
Páginas	115 - 138
Avaliado Cientificamente	Sim
Meio de Divulgação	Digital
Repositório ISCTE-IUL	Link para o repositório
Data Publicação (online)
Data Publicação (print)

Altmetric