Early experiments on automatic annotation of Portuguese medieval texts

Maria Inês Bico; Jorge Baptista; Fernando Batista; Esperança Cardeira

Ciência-IUL Publicações Descrição Detalhada da Publicação

Publicação em atas de evento científico Q3

Early experiments on automatic annotation of Portuguese medieval texts

Maria Inês Bico (Bico, M. I.); Jorge Baptista (Baptista, J.); Fernando Batista (Batista, F.); Esperança Cardeira (Cardeira, E.);

Linking theory and practice of digital libraries. Lecture Notes in Computer Science

Ano (publicação definitiva)

2022

Língua

Inglês

País

Itália

Mais Informação

Visitar Link

Web of Science®

N.º de citações: 0

(Última verificação: 2024-05-01 21:26)

Ver o registo na Web of Science®

Scopus

N.º de citações: 1

(Última verificação: 2024-04-27 14:48)

Ver o registo na Scopus

Índice de Impacto do Artigo: 0.4

Ver Mais

Google Scholar

N.º de citações: 3

(Última verificação: 2024-04-30 14:31)

Ver o registo no Google Scholar

Abstract/Resumo

This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.

Agradecimentos/Acknowledgements

Research for this paper was partially funded by public funds through FCT, proj.ref UIDB/50021/2020, proj.ref. UIDP/00214/2020, proj.ref. UI/BD/152806/2022

Palavras-chave

Automatic annotation,Lemmatization,Part-of-speech tagging,Old portuguese

Classificação Fields of Science and Technology

Matemáticas - Ciências Naturais
Ciências da Computação e da Informação - Ciências Naturais

Registos de financiamentos

Referência de financiamento	Entidade Financiadora
UIDB/50021/2020	Fundação para a Ciência e a Tecnologia
UI/BD/152806/2022	Fundação para a Ciência e a Tecnologia
UIDP/00214/2020	Fundação para a Ciência e a Tecnologia

Identificadores da Publicação

DOI (fonte: autor)	10.1007/978-3-031-16802-4_44
Scopus (fonte: Ciência-IUL)	2-s2.0-85138784150
Handle (fonte: Ciência-IUL)	http://hdl.handle.net/10071/26157
ID Ciência-IUL	ci-pub-90833
WoS (fonte: Ciência-IUL)	WOS:000867565900044

Outros Detalhes da Publicação

Ano Publicação Online	2022
Editora	Springer International Publishing
Indexação	Web of Science©; Scopus;
ISSN	0302-9743 (print) 1611-3349 (online)
ISBN	978-3-031-16801-7 (print) 978-3-031-16802-4 (online)
Volume	13541
Número Artigo
Páginas	442 - 449	Total Páginas	8
Avaliado Cientificamente	Sim
Meio de Divulgação	Ambos (impresso e digital)
Editores	Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A.
Título do Evento	26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022
Organizador do Evento
Cidade	Padua
Tipo de Evento	Conferência
Classificação do Evento	Internacional
Ano do Evento	2022
Tipo de Publicação no Evento	Artigo Completo
Repositório ISCTE-IUL	Link para o repositório
Data Publicação (online)
Data Publicação (print)

Altmetric

Dimensions

PlumX Metrics