This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on thetranslation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkitDELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-basedlinguistic checkpoints. Using output from the winning SMT, RbMT, and hybrid systems of the WMT 2013 shared task, translationquality barriers are investigated (in relation to the selected linguistic checkpoints) according to two main variables: (i) the type of theMT approach, i.e. statistical, rule-based or hybrid, and (ii) the human evaluation of MT output, ranked into three quality groupscorresponding to good, near miss and poor. We show that the combination of manual quality ranking and automatic diagnosticevaluation on a set of PoS-based linguistic checkpoints is able to identify the specific quality barriers of different MT system typesacross the four translation directions under consideration.
Relating Translation Quality Barriers to Source-Text Properties
Federico Gaspari;
2014-01-01
Abstract
This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on thetranslation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkitDELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-basedlinguistic checkpoints. Using output from the winning SMT, RbMT, and hybrid systems of the WMT 2013 shared task, translationquality barriers are investigated (in relation to the selected linguistic checkpoints) according to two main variables: (i) the type of theMT approach, i.e. statistical, rule-based or hybrid, and (ii) the human evaluation of MT output, ranked into three quality groupscorresponding to good, near miss and poor. We show that the combination of manual quality ranking and automatic diagnosticevaluation on a set of PoS-based linguistic checkpoints is able to identify the specific quality barriers of different MT system typesacross the four translation directions under consideration.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.