Export Publication
The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.
Bico, M. I., Baptista, J., Batista, F. & Cardeira, E. (2022). Early experiments on automatic annotation of Portuguese medieval texts. In Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A. (Ed.), Linking theory and practice of digital libraries. Lecture Notes in Computer Science. (pp. 442-449). Padua: Springer International Publishing.
B. M. Inês et al., "Early experiments on automatic annotation of Portuguese medieval texts", in Linking theory and practice of digital libraries. Lecture Notes in Computer Science, Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A., Ed., Padua, Springer International Publishing, 2022, vol. 13541, pp. 442-449
@inproceedings{inês2022_1715930055521, author = "Bico, M. I. and Baptista, J. and Batista, F. and Cardeira, E.", title = "Early experiments on automatic annotation of Portuguese medieval texts", booktitle = "Linking theory and practice of digital libraries. Lecture Notes in Computer Science", year = "2022", editor = "Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A.", volume = "13541", number = "", series = "", doi = "10.1007/978-3-031-16802-4_44", pages = "442-449", publisher = "Springer International Publishing", address = "Padua", organization = "", url = "https://link.springer.com/book/10.1007/978-3-031-16802-4" }
TY - CPAPER TI - Early experiments on automatic annotation of Portuguese medieval texts T2 - Linking theory and practice of digital libraries. Lecture Notes in Computer Science VL - 13541 AU - Bico, M. I. AU - Baptista, J. AU - Batista, F. AU - Cardeira, E. PY - 2022 SP - 442-449 SN - 0302-9743 DO - 10.1007/978-3-031-16802-4_44 CY - Padua UR - https://link.springer.com/book/10.1007/978-3-031-16802-4 AB - This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively. ER -