Provide a systematic investigation and exploration of the space of possible choices in Hybrid MT, in order to provide optimal support for Hybrid MT design, using sophisticated machine-learning (ML) technologies. The variants of hybrid MT to be explored in this way will subsume the approaches that have been investigated in recent work, such as
The systematic variation and optimisation along these dimensions will most likely reveal new combinations of the various components and approaches that have so far only been investigated in isolation.
One further important objective of WP 2 is to build bridges to the Machine Learning Community to systematically and jointly explore the choice space for Hybrid MT as ML technologies are likely to allow for a well-directed combination of the different components.
In close cooperation with Pillar II (Resource Infrastructure), we will develop a sample bitext corpus (English, German, Spanish and one "new" language (e.g. Czech) if possible based on a 1, 000 sentence section of the Europarl corpus), annotated with phrases and linguistic information from different (basic and hybrid) MT approaches, including SMT, phrase-based SMT, EBMT and RBMT. We will evaluate random samples for annotation consistency and omissions. We will also collect human judgements on the quality of alternative phrases using a suitable MT-evaluation tool ("Appraise", Federmann, 2010).
We will use the resource established in Task 2.1 for feature extraction and training ML methods to systematically explore optimal combination possibilities for Hybrid MT architectures. This task will take into account the influence of context-based word choice on hybrid translation quality.
We will evaluate the resulting Hybrid-MT models against (i) state-of-the-art phrase-based SMT systems, (ii) current syntax-enhanced SMT systems (e.g. Hiero, Joshua or Mosed with factored models), (iii) phrase-based SMT models that use phrases derived from different types of MT and (iv) system combinations (both MEMT and SPMT) and integrate the novel Hybrid Systems into system combinations. We will participate in shared tasks and open evaluation campaigns, in particular those organised by the International Workshops for Statistical Machine Translation (WMT).
In addition, we will evaluate our approach at approx. M23 using a simple phrase-level combination SMT approach, and perform a final evaluation at M35 of the ML-based approach, measuring the effect on translation quality of the integration of the hybrid system in multi-engine MT/system combinations. The results will be published. Technical work will be completed in M33, leaving time for wrap up and reporting.
We will organise or co-organise three workshops, one project internal workshop (on Data Preparation, around M9) and two joint workshops (ML/MT, around M21, M33) with representatives from both the machine leaning and the machine translation communities (reflecting the main ML approaches including MBL, SVM, etc. and the main MT approaches).
No. | Title | Due Date | Type |
---|---|---|---|
D2.1 | Annotated Hybrid Sample MT Corpus | M12 | Software |
D2.2 | Optimal Choice Selection in MT | M24 | Report |
D2.3 | Evaluation Report | M36 | Report |
D2.4.1-3 | Workshops | M8, M21, M33 | Other |