Rachel Wicks (JHU) “To Sentences and Beyond: Paving the Way for Context-Aware Machine Translation”

Most machine translation systems operate on the sentence-level while humans write and translate within a given context. Operating on individual sentences forces error-prone sentence segmentation into the machine translation pipeline. This limits the upper-bound performance of these systems by creating noisy training bitext. Further, many grammatical features necessitate inter-sentential context in order to translate which makes perfect sentence-level machine translation an impossible task. In this talk, we will cover the inherent limits of sentence-level machine translation. Following this, we will explore a key obstacle in the way of true context-aware machine translation—an abject lack of data.  Finally, we will cover recent work that provides (1) a new evaluation dataset that specifically addresses the translation of context-dependent discourse phenomena and (2) reconstructed documents from large-scale sentence-level bitext that can be used to improve performance when translating these types of phenomena.

Monday, March 4, 2024 - 12:00 to 13:15

Hackerman Hall B17 @ 3400 N. Charles Street, Baltimore, MD 21218