Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-2012)

Workshop Purpose and Theme

The workshop will explore alternatives in order to provide optimal support for Hybrid MT design, using sophisticated machine-learning techniques. One further important objective of the workshop is to build bridges from MT to the ML community to systematically and jointly explore the choice space for Hybrid MT.

Workshop Programme

The workshop will feature an invited talk by Alon Lavie, from Carnegie Mellon University, followed by a challenge or shared task session, and will conclude with a discussion panel.

All contributions will be published in the workshop proceedings.

Important Dates

May 20th Release of data for the challenge
October 10th Paper Submissions due / Challenge results due
October 18th Author notification / Release of challenge results
October 25th Final version due
November 19th Workshop

Acknowledgments

The ML4HMT workshop is supported by

Shared Task Description

The "Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT" is an effort to trigger systematic investigation on improving state-of-the-art Hybrid MT, using advanced machine-learning (ML) methodologies. Participants are requested to build Hybrid/System Combination systems by combining the output of several systems of different types, which is provided by the organizers.

The main focus of the shared task is trying to answer the following question:

Could Hybrid/System Combination MT techniques benefit from extra information (linguistically motivated, decoding and runtime) from the different systems involved?

Data — available from github.com/cfedermann/ML4HMT-2011: The participants are given a development bilingual set, data/ML4HMT-2011-data-1.0.tgz (13 MB), aligned at a sentence level. Each "bilingual sentence" contains:
- the source sentence,
- the target (reference) sentence and
- the corresponding multiple output translations from 5 different systems, based on different MT approaches (Apertium, Ramirez-Sanchez, 2006; Joshua, Zhifei Li et al, 2009; Lucy, Alonso and Thurmair, 2003; Matrex, Penkale et. al 2010) Metis, Vandeghinste et al., 2006). The output has been annotated with system-internal information deriving from the translation process of each of the systems (ML4HMT-data-description.pdf, 104 KB).
Baseline: As a baseline we consider state-of-the-art open-source system-combination systems, such as MANY (Barrault, 2010) and CMU-MEMT (Heafierld & Lavie, 2010).
Challenge: Participants are challenged to build an MT mechanism that improves over the baseline, by making effective use of the system-specific MT output. They can either provide solutions based on an open source system, or develop their own mechanisms. A suggested approach is given below.
- Spanish-English will be the language direction
- The development set can be used for tuning the systems during the development phase. Final submissions have to include translation output on a test set, which will be available one week before the submission deadline
- If you need language/reordering models they can be built upon the WMT News Commentary (http://www.statmt.org/wmt11/).
- Participants can also make use of additional linguistic analysis tools, if their systems require so, but they have to explicitly declare that upon submission, so that they are judged as "unconstrained" systems.
Evaluation: The system output will be judged via peer-based human evaluation. During the evaluation phase, participants will be requested to rank system outputs of other participants through a web-based interface (Appraise; Federmann 2010). Automatic metrics (BLEU, Papineni et. al, 2002) will be additionally used.
System description: shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics (see instructions in Submissions).

Organization

Chair: Toni Badia (Pompeu Fabra University, Spain)

Co-chairs: Christian Federmann (German Research Center for Artificial Intelligence, Germany), Josef van Genabith (Dublin City University, Ireland)

Committee members

Maite Melero (Barcelona Media Innovation Center, Spain), Marta R. Costa-jussa (Barcelona Media Innovation Center, Spain), Pavel Pecina (Dublin City University, Ireland), Eleftherios Avramidis (German Research Center for Artificial Intelligence, Germany)

Program Committee

Eleftherios Avramidis (German Research Center for Artificial Intelligence, Germany)
Rafael Banchs (Institute for Infocomm Reserarch - I2R, Singapore)
Loic Barrault (LIUM - University of Le Mans, France)
Chris Callison-Burch (Johns Hopkins University, MD, USA)
Jinhua Du (Faculty of Automation and Information Engineering, Xi'an University of Technology, Xi'an, China)
Andreas Eisele (Directorate-General for Translation (DGT), Luxembourg)
Cristina Espana-Bonet (Technical University of Catalonia, TALP, Barcelona)
Patrick Lambert (LIUM - University of Le Mans, France)
Maite Melero (Barcelona Media Innovation Center, Spain)
Pavel Pecina (Dublin City University, Ireland)
Marta R. Costa-jussa (Barcelona Media Innovation Center, Spain)
David Vilar (German Research Center for Artificial Intelligence, Germany)

Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-2011)

Barcelona (Spain) · Saturday, November 19th, 2011

In conjunction with International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011)