Workshop Purpose and Theme

The workshop and associated shared task are an effort to trigger a systematic investigation on improving state-of-the-art hybrid machine translation, making use of advanced machine-learning (ML) methodologies. It follows the ML4HMT-11 workshop which took place last November in Barcelona. The first workshop also road-tested a shared task (and associated data set) and laid the basis for a broader reach in 2012.

Organisational Details

You can download the workshop programme (PDF format, 90 KB) here.

The full proceedings (PDF format, 1.4 MB) and the corresponding reference (BibTex format) are available as well.

9:00Josef van Genabith
Welcome and introductory remarks
Slides (PDF format, 439 KB)
9:15Vassilina Nikoulina, Agnes Sandor, Marc Dymetman
Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation
Paper (PDF format, 193 KB) — Reference (BibTex format) — Slides (PDF format, 562 KB)
9:40Maoxi Li, Mingwen Wang, presented by Feifei Zhai
Confusion Network Based System Combination for Chinese Translation Output: Word-Level or Character-Level?
Paper (PDF format, 359 KB) — Reference (BibTex format) — Slides (PDF format, 1.1 MB)
10:05Kartik Asooja, Jorge Gracia, Nitish Aggarwal, Asunción Goméz Pérez, presented by Mihael Arcan
Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation
Paper (PDF format, 139 KB) — Reference (BibTex format) — Slides (PDF format, 945 KB)
10:30Xiaofeng Wu, Tsuyoshi Okita, Josef van Genabith, Qun Liu
System Combination with Extra Alignment Information
Paper (PDF format, 149 KB) — Reference (BibTex format) — Slides (PDF format, 372 KB)
10:50Tsuyoshi Okita, Antonio Toral, Josef van Genabith
Topic Modeling-based Domain Adaptation for System Combination
Paper (PDF format, 127 KB) — Reference (BibTex format) — Slides (PDF format, 920 KB)
11:10Tsuyoshi Okita, Raphaël Rubino, Josef van Genabith, presented by John Judge
Sentence-Level Quality Estimation for MT System Combination
Paper (PDF format, 133 KB) — Reference (BibTex format) — Slides (PDF format, 2.4 MB)
11:30Tea break
11:45Tsuyoshi Okita
Neural Probabilistic Language Model for System Combination
Paper (PDF format, 143 KB) — Reference (BibTex format) — Slides (PDF format, 419 KB)
12:05Christian Federmann
System Combination Using Joint, Binarised Feature Vectors
Paper (PDF format, 428 KB) — Reference (BibTex format) — Slides (PDF format, 800 KB)
12:25Christian Federmann, Tsuyoshi Okita, Maite Melero, Marta Ruiz Costa-Jussà, Toni Badia, Josef van Genabith
Results of the ML4HMT-12 Shared Task
Paper (PDF format, 111 KB) — Reference (BibTex format) — Slides (PDF format, 434 KB)

Discussion Panel

Panelists: Jan Hajič, Qun Liu, Hans Uszkoreit, Josef van Genabith

Topics include:

  1. The Future of Hybrid MT: is there a single-paradigm winner?
  2. Will we see increasing usage of additional, potentially highly sparse, features?
  3. Will research efforts in Machine Translation and Machine Learning converge?
  4. How do we evaluate progress in terms of translation quality for Hybrid MT?
  5. What are the baselines? Can Human Judgment be integrated?
Slides (PDF format, 333 KB)
12:50Jan Hajič · Institute of Formal and Applied Linguistics · Charles University in Prague
Invited talk: Deep Linguistic Information in Hybrid Machine Translation
Slides (PowerPoint format, 3.4 MB)

Regular Papers ML4HMT-12

We are soliciting original papers on hybrid MT, including (but not limited to):

  • use of machine learning methods in hybrid MT;
  • system combination: parallel in multi-engine MT (MEMT) or sequential in statistical post-editing (SPMT);
  • combining phrases and translation units from different types of MT;
  • syntactic pre-/re-ordering;
  • using richer linguistic information in phrase-based or in hierarchical SMT;
  • learning resources (e.g., transfer rules, transduction grammars) for probabilistic rule-based MT.

Shared Task ML4HMT-12

Participants are invited to build hybrid machine translation systems and/or system combinations by using the output of several MT systems of different types, as provided by the organisers (ML4HMT corpus).


