Publication in conference proceedings
Machine translation of non-contiguous multiword units
Anabela Barreiro (Barreiro, A.); Fernando Batista (Batista, F.);
Workshop on Discontinuous Structures in Natural Language Processing (DiscoNLP 2016)
Year (definitive publication)
2016
Language
English
Country
United States of America
More Information
Web of Science®

This publication is not indexed in Web of Science®

Scopus

This publication is not indexed in Scopus

Google Scholar

Times Cited: 11

(Last checked: 2025-12-18 18:58)

View record in Google Scholar

This publication is not indexed in Overton

Abstract
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an empirical analysis of non-contiguous multiwords, and highlights our use of the Logos Model and the Semtab function to deploy semantic knowledge to align non-contiguous multiword units with the goal to translate these units with high fidelity. The phrase level manual alignments illustrated in the paper were produced with the CLUE-Aligner, a Cross-Language Unit Elicitation alignment tool.
Acknowledgements
--
Keywords
Funding Records
Funding Reference Funding Entity
UID/CEC/50021/2013 Fundação para a Ciência e Tecnologia
SFRH/BPD/91446/2012 Fundação para a Ciência e Tecnologia
EXPL/MHC-LIN/2260/2013 project eSPERTo