The workshop and associated shared task are an effort to trigger a systematic investigation on improving state-of-the-art hybrid machine translation, making use of advanced machine-learning (ML) methodologies. It follows the ML4HMT-11 workshop which took place last November in Barcelona. The first workshop also road-tested a shared task (and associated data set) and laid the basis for a broader reach in 2012.
We are soliciting original papers on hybrid MT, including (but not limited to):
Full papers should be anonymous and follow the COLING full paper format
The main focus of the Shared Task is to address the question:
“Can Hybrid MT and System Combination techniques benefit from extra information (linguistically motivated, decoding, runtime, confidence scores, or other meta-data) from the systems involved?”
Participants are invited to build hybrid MT systems and/or system combinations by using the output of several MT systems of different types, as provided by the organisers. While participants are encouraged to explore machine learning techniques to explore the additional meta-data information sources, other general improvements in hybrid and combination based MT are welcome to participate in the challenge. For systems that exploit additional meta-data information the challenge is that additional meta-data is highly heterogeneous and (individual) system specific.
Shared task participants are invited to submit system description papers (7 pages, not anonymised) and should follow COLING short paper format.
The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data sets, in each case translating into EN.
data/tuningDataESEN.tar.bz2
(57.1 MB)data/ml4hmt-12.testDataESEN.tar.bz2
(6.1 MB)Participants are challenged to build an MT mechanism where possible making effective use of the system-specific MT meta-data output. They can provide solutions based on open-source systems, or develop their own mechanisms. The development set can be used for tuning the systems during the development phase. Final submissions have to include translation output on a test set, which will be made available one week after training data release. Data will be provided to build language/reordering models, possibly re-using existing resources from MT research.
Participants can also make use of additional (linguistic analysis, confidence estimation, etc.) tools, if their systems require so, but they have to explicitly declare this upon submission, so that they are judged as "unconstrained" systems. This will allow for a better comparison between participating systems.
Shared task results should be submitted via email attachment. Please compress your results as .zip or .gz archive and send them via email. Use ML4HMT-12 Shared Task Submission
as mail subject. Shared task results are due by extended October 28th.
System output will be judged via peer-based human evaluation as well as automatic evaluation. During the evaluation phase, participants will be requested to rank system outputs of other participants through a web-based interface (Appraise; Federmann, 2010). Automatic metrics include BLEU (Papineni et.al., 2002), TER (Snover et.al., 2006) and METEOR (Lavie, 2005).
Results from the automatic evaluation of submitted shared task results will be made available to participants on updated November 12th so that they could be referred to in system description papers. As the manual evaluation will take longer, its results will be presented and published at the workshop.
Papers need to be written in English and can either be 1) research papers, or 2) shared task system description papers. Please indicate the type of your submission by choosing the correct "Submission Type". Note that research papers need to be properly anonymised; system description papers may include author information. Paper submission deadline for research papers is extended October 22nd while system description papers need to be submitted on updated November 12th.
If you are interested in our workshop and intend to participate, we'd much appreciate if you could inform us about your participation intent beforehand so that we can better plan the workshop; to do so, send an email. Participation is required for at least one author of any accepted paper.