Ciência_Iscte Comunicações Descrição Detalhada da Comunicação Exportar

Exportar Publicação

A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.

Exportar Referência (APA)

Ribeiro, E., Mamede, N. & J. Baptista (2024). Automatic Text Readability Assessment in European Portuguese. 16th International Conference on Computational Processing of Portuguese (PROPOR 2024).

Exportar Referência (IEEE)

E. A. Ribeiro et al.,  "Automatic Text Readability Assessment in European Portuguese", in 16th Int. Conf. on Computational Processing of Portuguese (PROPOR 2024), Santiago de Compostela, 2024

Exportar BibTeX

@misc{ribeiro2024_1777868239471,
	author = "Ribeiro, E. and Mamede, N. and J. Baptista",
	title = "Automatic Text Readability Assessment in European Portuguese",
	year = "2024",
	url = "https://propor2024.citius.gal/"
}

Exportar RIS

TY  - CPAPER
TI  - Automatic Text Readability Assessment in European Portuguese
T2  - 16th International Conference on Computational Processing of Portuguese (PROPOR 2024)
AU  - Ribeiro, E.
AU  - Mamede, N.
AU  - J. Baptista
PY  - 2024
CY  - Santiago de Compostela
UR  - https://propor2024.citius.gal/
AB  - The automatic assessment of text readability and the classification of texts by levels is essential for language education and language-related industries that rely on effective communication. The Common European Framework of Reference for Languages (CEFR) provides a widely recognized framework for classifying language proficiency levels. This framework can be used not only to assess the proficiency of learners of a given language, but also from a readability perspective, as a means to identify the proficiency required to understand specific pieces of text. In this study, we address the automatic assessment of text readability according to CEFR levels in European Portuguese. For that, we explore the fine-tuning of several foundation models on textual data used for proficiency evaluation purposes. Additionally, we aim at setting the ground for more comparable research on this subject by defining a new publicly available test set. Our experiments show that the best models can achieve around 80% accuracy and 75% macro F1 score. However, they have difficulty in generalizing to different types of text, which reveals the need for additional and more diverse training data.
ER  -