Exportar Publicação
A publicação pode ser exportada nos seguintes formatos: referência da APA (American Psychological Association), referência do IEEE (Institute of Electrical and Electronics Engineers), BibTeX e RIS.
R. D. Ribeiro, "Anotação Morfossintáctica Desambiguada do Português",, 2003
@null{ribeiro2003_1732207682618, year = "2003", url = "http://www.inesc-id.pt/publications/1424/pdf" }
TY - GEN TI - Anotação Morfossintáctica Desambiguada do Português AU - Ribeiro, R. PY - 2003 UR - http://www.inesc-id.pt/publications/1424/pdf AB - In this thesis we present the development of a part-of-speech tagging system for Portuguese. The main motivation for the development of the system was the intention of using it as a component of a text-to-speech synthesis system. The architecture of the tagger comprehends a morphological analysis module and a morphossyntactic disambiguation module. The importance of the morphological analysis module draws from the fact that neolatin languages, such as Portuguese, are highly inflectional, which results in the lack of the necessary examples to develop reliable language models – the data sparseness problem. The morphossyntactic disambiguation module combines two different approaches: linguistic-oriented rule-based disambiguation and probabilistic disambiguation. The system was trained and tested using the annotated PAROLE corpus. The results achieved show that the presented architecture is well suited for European Portuguese. Although it is difficult to do a fundamented comparison between this and other taggers addressing the Portuguese language – since, for example, the tagsets are different and the used corpora were not the same – this system seems to achieve a better performance. Additionally, it is important to stress the efforts made to ensure the modularity of the system, allowing an easy interchange of modules and simplicity of integration in other systems. ER -