Ciência_Iscte Publicações Descrição Detalhada da Publicação Exportar

Exportar Publicação

A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.

Exportar Referência (APA)

Martim Zanatti, Ribeiro, R. & Pinto, H. Sofia (2025). Exploring Metric Correlations for Legal Text Summarization Evaluation. In Juliano Maranhão (Ed.), Proceedings of the Twentieth International Conference on Artificial Intelligence and Law. (pp. 389-393). Chicago , IL , USA: ACM.

Exportar Referência (IEEE)

M. Zanatti et al.,  "Exploring Metric Correlations for Legal Text Summarization Evaluation", in Proc. of the Twentieth Int. Conf. on Artificial Intelligence and Law, Juliano Maranhão, Ed., Chicago , IL , USA, ACM, 2025, pp. 389-393

Exportar BibTeX

@inproceedings{zanatti2025_1777864330946,
	author = "Martim Zanatti and Ribeiro, R. and Pinto, H. Sofia",
	title = "Exploring Metric Correlations for Legal Text Summarization Evaluation",
	booktitle = "Proceedings of the Twentieth International Conference on Artificial Intelligence and Law",
	year = "2025",
	editor = "Juliano Maranhão",
	volume = "",
	number = "",
	series = "",
	doi = "10.1145/3769126.3769206",
	pages = "389-393",
	publisher = "ACM",
	address = "Chicago , IL , USA",
	organization = "",
	url = "https://dl.acm.org/doi/pdf/10.1145/3769126.3769206"
}

Exportar RIS

TY - CPAPER
TI - Exploring Metric Correlations for Legal Text Summarization Evaluation
T2 - Proceedings of the Twentieth International Conference on Artificial Intelligence and Law
AU - Martim Zanatti
AU - Ribeiro, R.
AU - Pinto, H. Sofia
PY - 2025
SP - 389-393
DO - 10.1145/3769126.3769206
CY - Chicago , IL , USA
UR - https://dl.acm.org/doi/pdf/10.1145/3769126.3769206
AB - The rapid advancements in legal text summarization have not been matched by equivalent progress in evaluation metrics capable of assessing the quality of legal summaries. Traditional evaluation approaches, such as ROUGE, remain widely used despite their inability to capture semantic fidelity. While more recent metrics focus on semantic evaluation, their applicability to legal summarization has not been thoroughly tested, and their performance is highly dependent on embedding models and computational resources, particularly for long and complex legal texts. Furthermore, the absence of publicly available datasets with expert annotations hinders the development and validation of domain-specific evaluation methods. In this paper, we address these challenges by introducing the first publicly available dataset of Portuguese legal summaries, annotated by legal experts across multiple dimensions such as Coherence and Relevance. We use this dataset to systematically evaluate several recent evaluation metrics, comparing their performance against ROUGE, the standard metric for summarization tasks. Our analysis, based on Spearman correlation with human judgments, reveals that ROUGE-2 maintains the highest correlation across almost every evaluated dimension, outperforming more recent metrics, including semantic-based approaches. These results emphasize the challenges of adapting new evaluation frameworks to the legal domain and underscore the need for further research into metrics that can better capture domain-specific requirements.
ER -