The METEOR Automatic Machine
Translation Evaluation System
Alon Lavie
Abhaya Agarwal
Carnegie Mellon University
Pittsburgh, PA, USA
Download METEOR
- current version v0.6 - May 23, 2007 [tar.gz]
[view changelog]. (Please download again if you downloaded before 23rd May. A small bug in wn_stem modules has been fixed, pointed out by Inho Kang).
- Older version of METEOR are available
Please send any questions and bug reports to Abhaya Agarwal at abhayaa AT cs DOT cmu DOT edu.
News
- Starting with version 0.6, METEOR supports French, German and Spanish
apart from English. Find out the details here.
- Parameters inside METEOR have changed. Please refer
Lavie & Agarwal, 2007
for details.
About METEOR
METEOR is a system that automatically evaluates the output of machine
translation engines by comparing to them to (one or more) reference
translations. For a given pair of hypothesis and reference strings,
the evaluation proceeds in a sequence of stages, with different
criteria being used at each stage to find and score unigram
matches. By default, at the first stage all exact matches are detected
between the two strings, while in the second stage the words not
matched in the first stage are stemmed using the Porter stemmer and
then matches are found between these stemmed words. For further details,
please refer Banerjee & Lavie,2005
.
The matching system is written in Perl, and each matching stage
is implemented as a separate Perl module. In addition to the
two default matching modules (exact matching and stemmed matching), a
WordNet based stemmed matching module and a WordNet based synonym
matching module are also provided with this distribution. METEOR can
be run with the default modules, or the user can override the
defaults, and use one or more of the given modules in any order of
preference. Further, the user can write his own matching module and
plug it into the generic matching system.
METEOR's input file format is exactly the same as those of Bleu and
NIST's Machine Translation Evaluation system. Thus all translation
data that can be evaluated using Bleu (such as the TIDES data) can
also be directly evaluated using METEOR.
Starting from version 0.5 METEOR can take as input n-best lists
and score them. You can read more details about the current version of
METEOR here.
References
- [Lavie & Agarwal,2007] 2007, Lavie, A., A. Agarwal. "METEOR: An Automatic Metric for MT
Evaluation with High Levels of Correlation with Human Judgments",
To appear in Proceedings of Workshop on Statistical Machine Translation
at the 45th Annual Meeting of the Association of Computational Linguistics (ACL-2007),
Prague, June 2007 [pdf]
-
[Banerjee & Lavie,2005] 2005,Banerjee, S. and A. Lavie, "METEOR: An Automatic Metric for MT
Evaluation with Improved Correlation with Human Judgments",
Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures
for MT and/or Summarization at the 43th Annual Meeting of
the Association of Computational Linguistics (ACL-2005),
Ann Arbor, Michigan, June 2005. [pdf]
This page last modified on: 8th May, 2007.