Exportar Publicação

A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.

Exportar Referência (APA)
Ricardo Rei, Nuno Miguel Guerreiro & Batista, F. (2020). Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach. In Lesot, Marie-Jeanne and Vieira, Susana and Reformat, Marek Z. and Carvalho, João Paulo and Wilbik, Anna and Bouchon-Meunier, Bernadette and Yager, Ronald R. (Ed.), Information Processing and Management of Uncertainty in Knowledge-Based Systems. (pp. 708-721).: Springer International Publishing.
Exportar Referência (IEEE)
R. Rei et al.,  "Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach", in Information Processing and Management of Uncertainty in Knowledge-Based Systems, Lesot, Marie-Jeanne and Vieira, Susana and Reformat, Marek Z. and Carvalho, João Paulo and Wilbik, Anna and Bouchon-Meunier, Bernadette and Yager, Ronald R., Ed., Springer International Publishing, 2020, pp. 708-721
Exportar BibTeX
@inproceedings{rei2020_1734974457856,
	author = "Ricardo Rei and Nuno Miguel Guerreiro and Batista, F.",
	title = "Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach",
	booktitle = "Information Processing and Management of Uncertainty in Knowledge-Based Systems",
	year = "2020",
	editor = "Lesot, Marie-Jeanne and Vieira, Susana and Reformat, Marek Z. and Carvalho, João Paulo and Wilbik, Anna and Bouchon-Meunier, Bernadette and Yager, Ronald R.",
	volume = "",
	number = "",
	series = "",
	doi = "10.1007/978-3-030-50146-4_52",
	pages = "708-721",
	publisher = "Springer International Publishing",
	address = "",
	organization = "",
	url = "https://ipmu2020.inesc-id.pt"
}
Exportar RIS
TY  - CPAPER
TI  - Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach
T2  - Information Processing and Management of Uncertainty in Knowledge-Based Systems
AU  - Ricardo Rei
AU  - Nuno Miguel Guerreiro
AU  - Batista, F.
PY  - 2020
SP  - 708-721
DO  - 10.1007/978-3-030-50146-4_52
UR  - https://ipmu2020.inesc-id.pt
AB  - This paper describes an approach for automatic capitalization of text without case information, such as spoken transcripts of video subtitles, produced by automatic speech recognition systems. Our approach is based on pre-trained contextualized word embeddings, requires only a small portion of data for training when compared with traditional approaches, and is able to achieve state-of-the-art results. The paper reports experiments both on general written data from the European Parliament, and on video subtitles, revealing that the proposed approach is suitable for performing capitalization, not only in each one of the domains, but also in a cross-domain scenario. We have also created a versatile multilingual model, and the conducted experiments show that good results can be achieved both for monolingual and multilingual data. Finally, we applied domain adaptation by finetuning models, initially trained on general written data, on video subtitles, revealing gains over other approaches not only in performance but also in terms of computational cost.
ER  -