Ciência_Iscte Publications Publication Detailed Description Export

Export Publication

The publication can be exported in the following formats: APA (American Psychological Association) reference format, IEEE (Institute of Electrical and Electronics Engineers) reference format, BibTeX and RIS.

Export Reference (APA)

Bico, M. I., Baptista, J., Batista, F. & Cardeira, E. (2024). Enriching Portuguese medieval texts with named entity recognition. International Journal of Humanities and Arts Computing. 18 (1), 109-124

Export Reference (IEEE)

B. M. Inês et al.,  "Enriching Portuguese medieval texts with named entity recognition", in Int. Journal of Humanities and Arts Computing, vol. 18, no. 1, pp. 109-124, 2024

Export BibTeX

@article{inês2024_1782452929268,
	author = "Bico, M. I. and Baptista, J. and Batista, F. and Cardeira, E.",
	title = "Enriching Portuguese medieval texts with named entity recognition",
	journal = "International Journal of Humanities and Arts Computing",
	year = "2024",
	volume = "18",
	number = "1",
	doi = "10.3366/ijhac.2024.0324",
	pages = "109-124",
	url = "https://www.euppublishing.com/loi/ijhac"
}

Export RIS

TY - JOUR
TI - Enriching Portuguese medieval texts with named entity recognition
T2 - International Journal of Humanities and Arts Computing
VL - 18
IS - 1
AU - Bico, M. I.
AU - Baptista, J.
AU - Batista, F.
AU - Cardeira, E.
PY - 2024
SP - 109-124
SN - 1753-8548
DO - 10.3366/ijhac.2024.0324
UR - https://www.euppublishing.com/loi/ijhac
AB - Historical data poses unique challenges to natural language processing (NLP) and information retrieval (IR) tools, including digitization errors, lack of annotated data, and diachronic-specific issues. However, the increasing recognition of the value in historical documents has promoted efforts to semantically enrich and optimize their analysis. This article contributes to this endeavour by enriching the Corpus de Textos Antigos through NLP tools and techniques to enhance its usability and support research. The corpus undergoes linguistic annotation, including part-of-speech tagging, lemma annotation and named entity recognition (NER). Subsequently, the article delves into the tasks of entity disambiguation and entity linking, which involve identifying and disambiguating named entities by referring to a knowledge base (KB). Addressing the challenges posed by factors such as text state, epoch and the chosen KB, the article presents insights into related work, annotation results and the linguistic interest of a medieval annotated corpus for named entities. It concludes by discussing the challenges and providing avenues for future research in this domain.
ER -