Exportar Publicação
A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.
Nunes, N., de Almeida, A. & Peixoto, A. (2026). Singularity Score For Evaluating Topic Relevance In Tiny Text. WorldCist'26 - 14th World Conference on Information Systems and Technologies.
N. L. Nunes et al., "Singularity Score For Evaluating Topic Relevance In Tiny Text", in WorldCist'26 - 14th World Conf. on Information Systems and Technologies, 2026
@misc{nunes2026_1774310881377,
author = "Nunes, N. and de Almeida, A. and Peixoto, A.",
title = "Singularity Score For Evaluating Topic Relevance In Tiny Text",
year = "2026",
url = "https://worldcist.org"
}
TY - CPAPER TI - Singularity Score For Evaluating Topic Relevance In Tiny Text T2 - WorldCist'26 - 14th World Conference on Information Systems and Technologies AU - Nunes, N. AU - de Almeida, A. AU - Peixoto, A. PY - 2026 UR - https://worldcist.org AB - Topic modeling is a widely used method for extracting relevant information and insights from text, given its strong results. When using this technique, it is necessary to evaluate the topics identified. However, when the text is very short, with fewer than 10 words per document on average, the classical evaluation metrics can be unreliable. To extract meaningful topics and identify the most suitable modeling technique, this study applied topic modeling to this type of data – tiny text – using user-generated Portuguese texts collected from post-its during PLANAPP workshops. Six datasets with different preprocessing steps were tested using LDA and BERTopic, the latter with two sentence- transformers (Multilingual and AlBERTina). As expected, the classical evaluation metrics proved inconsistent for such short texts, motivating the creation of a new measurement of topic coherence, the Singularity Score, that intends to mimic human annotators. Results show that BERTopic produced more coherent topics, despite the fact that LDA scores higher in traditional metrics. In summary, this work demonstrates that topic modeling can be effectively applied to tiny Portuguese texts, identifies BERTopic as the most suitable approach, and introduces SS as a novel metric for assessing topic quality. ER -
English