Exportar Publicação
A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.
Imperial, J.M., Barayan, A., Stodden, R., Wilkens, R., Muñoz Sánchez, R., Gao, L....Tayyar Madabushi, H. (2025). UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng (Ed.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. (pp. 9714-9766). Suzhou, China: Association for Computational Linguistics.
J. M. Imperial et al., "UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment", in Proc. of the 2025 Conf. on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng, Ed., Suzhou, China, Association for Computational Linguistics, 2025, pp. 9714-9766
@inproceedings{imperial2025_1777826376763,
author = "Imperial, J.M. and Barayan, A. and Stodden, R. and Wilkens, R. and Muñoz Sánchez, R. and Gao, L. and Torgbi, M. and Knight, D. and Forey, G. and Jablonkai, R.R. and Kochmar, E. and Reynolds, R. and Ribeiro, E. and Saggion, H. and Volodina, E. and Vajjala, S. and François, T. and Alva-Manchego, F. and Tayyar Madabushi, H.",
title = "UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
year = "2025",
editor = "Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng",
volume = "",
number = "",
series = "",
doi = "10.18653/v1/2025.emnlp-main.491",
pages = "9714-9766",
publisher = "Association for Computational Linguistics",
address = "Suzhou, China",
organization = "",
url = "https://2025.emnlp.org/"
}
TY - CPAPER TI - UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment T2 - Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing AU - Imperial, J.M. AU - Barayan, A. AU - Stodden, R. AU - Wilkens, R. AU - Muñoz Sánchez, R. AU - Gao, L. AU - Torgbi, M. AU - Knight, D. AU - Forey, G. AU - Jablonkai, R.R. AU - Kochmar, E. AU - Reynolds, R. AU - Ribeiro, E. AU - Saggion, H. AU - Volodina, E. AU - Vajjala, S. AU - François, T. AU - Alva-Manchego, F. AU - Tayyar Madabushi, H. PY - 2025 SP - 9714-9766 DO - 10.18653/v1/2025.emnlp-main.491 CY - Suzhou, China UR - https://2025.emnlp.org/ AB - We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR (Common European Framework of Reference) scale in 13 languages. To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modeling across tasks and languages. To demonstrate its utility, we conduct benchmark experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results further support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution in language proficiency research by standardising dataset formats and promoting their accessibility to the global research community. ER -
English