Exportar Publicação

A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.

Exportar Referência (APA)
Imperial, J.M., Barayan, A., Stodden, R., Wilkens, R., Muñoz Sánchez, R., Gao, L....Tayyar Madabushi, H. (2025). UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng (Ed.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. (pp. 9714-9766). Suzhou, China: Association for Computational Linguistics.
Exportar Referência (IEEE)
J. M. Imperial et al.,  "UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment", in Proc. of the 2025 Conf. on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng, Ed., Suzhou, China, Association for Computational Linguistics, 2025, pp. 9714-9766
Exportar BibTeX
@inproceedings{imperial2025_1777826376763,
	author = "Imperial, J.M. and Barayan, A. and Stodden, R. and Wilkens, R. and Muñoz Sánchez, R. and Gao, L. and Torgbi, M. and Knight, D. and Forey, G. and Jablonkai, R.R. and Kochmar, E. and Reynolds, R. and Ribeiro, E. and Saggion, H. and Volodina, E. and Vajjala, S. and François, T. and Alva-Manchego, F. and Tayyar Madabushi, H.",
	title = "UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment",
	booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
	year = "2025",
	editor = "Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng",
	volume = "",
	number = "",
	series = "",
	doi = "10.18653/v1/2025.emnlp-main.491",
	pages = "9714-9766",
	publisher = "Association for Computational Linguistics",
	address = "Suzhou, China",
	organization = "",
	url = "https://2025.emnlp.org/"
}
Exportar RIS
TY  - CPAPER
TI  - UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
T2  - Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
AU  - Imperial, J.M.
AU  - Barayan, A.
AU  - Stodden, R.
AU  - Wilkens, R.
AU  - Muñoz Sánchez, R.
AU  - Gao, L.
AU  - Torgbi, M.
AU  - Knight, D.
AU  - Forey, G.
AU  - Jablonkai, R.R.
AU  - Kochmar, E.
AU  - Reynolds, R.
AU  - Ribeiro, E.
AU  - Saggion, H.
AU  - Volodina, E.
AU  - Vajjala, S.
AU  - François, T.
AU  - Alva-Manchego, F.
AU  - Tayyar Madabushi, H.
PY  - 2025
SP  - 9714-9766
DO  - 10.18653/v1/2025.emnlp-main.491
CY  - Suzhou, China
UR  - https://2025.emnlp.org/
AB  - We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR (Common European Framework of Reference) scale in 13 languages. To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modeling across tasks and languages. To demonstrate its utility, we conduct benchmark experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results further support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution in language proficiency research by standardising dataset formats and promoting their accessibility to the global research community.
ER  -