Ciência_Iscte
Publications
Publication Detailed Description
Workshop on Discontinuous Structures in Natural Language Processing (DiscoNLP 2016)
Year (definitive publication)
2016
Language
English
Country
United States of America
More Information
Web of Science®
This publication is not indexed in Web of Science®
Scopus
This publication is not indexed in Scopus
Google Scholar
This publication is not indexed in Overton
Abstract
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an empirical analysis of non-contiguous multiwords, and highlights our use of the Logos Model and the Semtab function to deploy semantic knowledge to align non-contiguous multiword units with the goal to translate these units with high fidelity. The phrase level manual alignments illustrated in the paper were produced with the CLUE-Aligner, a Cross-Language Unit Elicitation alignment tool.
Acknowledgements
--
Keywords
Funding Records
| Funding Reference | Funding Entity |
|---|---|
| UID/CEC/50021/2013 | Fundação para a Ciência e Tecnologia |
| SFRH/BPD/91446/2012 | Fundação para a Ciência e Tecnologia |
| EXPL/MHC-LIN/2260/2013 | project eSPERTo |
Português