Early experiments on automatic annotation of Portuguese medieval texts

Maria Inês Bico; Jorge Baptista; Fernando Batista; Esperança Cardeira

Ciência-IUL Publications Publication Detailed Description

Publication in conference proceedings Q3

Early experiments on automatic annotation of Portuguese medieval texts

Maria Inês Bico (Bico, M. I.); Jorge Baptista (Baptista, J.); Fernando Batista (Batista, F.); Esperança Cardeira (Cardeira, E.);

Linking theory and practice of digital libraries. Lecture Notes in Computer Science

Year (definitive publication)

2022

Language

English

Country

Italy

More Information

Visit Link

Web of Science®

Times Cited: 0

(Last checked: 2024-05-16 20:57)

View record in Web of Science®

Scopus

Times Cited: 1

(Last checked: 2024-05-13 08:42)

View record in Scopus

Article Impact Index: 0.4

Google Scholar

Times Cited: 3

(Last checked: 2024-05-13 11:53)

View record in Google Scholar

Abstract

This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.

Acknowledgements

Research for this paper was partially funded by public funds through FCT, proj.ref UIDB/50021/2020, proj.ref. UIDP/00214/2020, proj.ref. UI/BD/152806/2022

Keywords

Automatic annotation,Lemmatization,Part-of-speech tagging,Old portuguese

Fields of Science and Technology Classification

Mathematics - Natural Sciences
Computer and Information Sciences - Natural Sciences

Funding Records

Funding Reference	Funding Entity
UIDB/50021/2020	Fundação para a Ciência e a Tecnologia
UI/BD/152806/2022	Fundação para a Ciência e a Tecnologia
UIDP/00214/2020	Fundação para a Ciência e a Tecnologia

Publication Identifiers

Ciência-IUL ID	ci-pub-90833
Scopus (source: Ciência-IUL)	2-s2.0-85138784150
WoS (source: Ciência-IUL)	WOS:000867565900044
DOI (source: author)	10.1007/978-3-031-16802-4_44
Handle (source: Ciência-IUL)	http://hdl.handle.net/10071/26157

Other Publication Details

Online Publication Year	2022
Publisher	Springer International Publishing
Indexes	Web of Science©; Scopus;
ISSN	0302-9743 (print) 1611-3349 (online)
ISBN	978-3-031-16801-7 (print) 978-3-031-16802-4 (online)
Volume	13541
Article Number
Pages	442 - 449	Total Pages	8
Peer Reviewed	Yes
Dissemination Mean	Both (printed and digital)
Editors	Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A.
Event Title	26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022
Event Organizer
City	Padua
Event Type	Conference
Event Classification	International
Event Year	2022
Event Publication Type	Full Paper
ISCTE-IUL Repository	Link to the repository
Publication Date (online)
Publication Date (print)

Altmetric

Dimensions

PlumX Metrics