Exportar Publicação
A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.
Martim Zanatti, Ribeiro, R. & Pinto, H. Sofia (2025). Exploring Metric Correlations for Legal Text Summarization Evaluation. In Juliano Maranhão (Ed.), Proceedings of the Twentieth International Conference on Artificial Intelligence and Law. (pp. 389-393). Chicago , IL , USA: ACM.
M. Zanatti et al., "Exploring Metric Correlations for Legal Text Summarization Evaluation", in Proc. of the Twentieth Int. Conf. on Artificial Intelligence and Law, Juliano Maranhão, Ed., Chicago , IL , USA, ACM, 2025, pp. 389-393
@inproceedings{zanatti2025_1777864330946,
author = "Martim Zanatti and Ribeiro, R. and Pinto, H. Sofia",
title = "Exploring Metric Correlations for Legal Text Summarization Evaluation",
booktitle = "Proceedings of the Twentieth International Conference on Artificial Intelligence and Law",
year = "2025",
editor = "Juliano Maranhão",
volume = "",
number = "",
series = "",
doi = "10.1145/3769126.3769206",
pages = "389-393",
publisher = "ACM",
address = "Chicago , IL , USA",
organization = "",
url = "https://dl.acm.org/doi/pdf/10.1145/3769126.3769206"
}
TY - CPAPER TI - Exploring Metric Correlations for Legal Text Summarization Evaluation T2 - Proceedings of the Twentieth International Conference on Artificial Intelligence and Law AU - Martim Zanatti AU - Ribeiro, R. AU - Pinto, H. Sofia PY - 2025 SP - 389-393 DO - 10.1145/3769126.3769206 CY - Chicago , IL , USA UR - https://dl.acm.org/doi/pdf/10.1145/3769126.3769206 AB - The rapid advancements in legal text summarization have not been matched by equivalent progress in evaluation metrics capable of assessing the quality of legal summaries. Traditional evaluation approaches, such as ROUGE, remain widely used despite their inability to capture semantic fidelity. While more recent metrics focus on semantic evaluation, their applicability to legal summarization has not been thoroughly tested, and their performance is highly dependent on embedding models and computational resources, particularly for long and complex legal texts. Furthermore, the absence of publicly available datasets with expert annotations hinders the development and validation of domain-specific evaluation methods. In this paper, we address these challenges by introducing the first publicly available dataset of Portuguese legal summaries, annotated by legal experts across multiple dimensions such as Coherence and Relevance. We use this dataset to systematically evaluate several recent evaluation metrics, comparing their performance against ROUGE, the standard metric for summarization tasks. Our analysis, based on Spearman correlation with human judgments, reveals that ROUGE-2 maintains the highest correlation across almost every evaluated dimension, outperforming more recent metrics, including semantic-based approaches. These results emphasize the challenges of adapting new evaluation frameworks to the legal domain and underscore the need for further research into metrics that can better capture domain-specific requirements. ER -
English