Multidimensional Quality Metrics (MQM)

An open, standardized framework that categorizes translation errors by type and severity to produce consistent, comparable quality scores across any language or project.

Before MQM, the localization industry had no shared standard for measuring translation quality. Different companies, tools, and clients used incompatible scoring systems, what counted as a major error in one framework was minor in another, making quality scores meaningless outside the context in which they were produced. MQM was developed to solve this problem: a common vocabulary for describing translation errors that any team, tool, or LSP can use and compare.

MQM was originally developed through the EU-funded QTLaunchPad project and has since been adopted and maintained by the localization community through GALA and the W3C MQM Community Group. It applies to human translation, machine translation, and AI-generated translation, making it one of the few frameworks designed to work across all translation types in a single scoring model.

📐 How MQM works #️⃣

MQM is an analytic evaluation framework, meaning errors are identified and annotated at the segment level, associated with specific words or phrases in the translated text, rather than assessed holistically at the document level.

The process follows three stages:

  1. Define the Metric. The team selects specific “issue types” relevant to the project (e.g., a UI project might prioritize “Locale Convention,” while a legal doc prioritizes “Accuracy”).
  2. Annotate Errors. A reviewer classifies errors into four severity levels: Neutral (0), Minor (1), Major (5), and Critical (25 or automatic fail).
  3. Calculate the Score. The total penalty points are subtracted from a perfect score (usually 100) or divided by the word count to produce a Quality Score (QS).

🗂️ MQM error dimensions #️⃣

MQM organizes errors into eight major dimensions, each covering a distinct aspect of translation quality:

  • Accuracy — how faithfully the target text reflects the meaning of the source
  • Fluency — linguistic well-formedness of the target text, regardless of whether it is a translation
  • Terminology — correct use of domain-specific or project-specific terms
  • Style — adherence to style guides and expected register
  • Locale convention — compliance with locale-specific formats for dates, numbers, currency, and similar elements
  • Verity — correspondence between the text and real-world facts or context
  • Design — formatting and layout issues in the translated output
  • Internationalization — issues related to how well the source content was prepared for translation

Each dimension contains more granular issue types, MQM defines over 100 in total, but teams typically select only the dimensions and issue types relevant to their project type and content.

🔍 Key points about MQM #️⃣

  • MQM is open and extensible. You don’t have to use all 100+ categories. Most software teams create a “Customized Subset” that focuses only on UI and Technical accuracy.
  • It superseded earlier frameworks like the LISA QA Model and SAE J2450, which were either too rigid or too narrow to cover the full range of localization content types.
  • MQM scores are only comparable when the same metric configuration, severity weights, and threshold values are used. A score of 95 from one team does not automatically mean the same as 95 from another if the parameters differ.
  • Most modern Translation Management Systems (TMS) have MQM-based LQA (Linguistic Quality Assurance) workflows built-in.
  • MQM-annotated data is the primary “gold standard” used to train Quality Estimation (QE) models and LLMs to recognize “good” translation.

🔄 MQM and the broader quality workflow #️⃣

MQM sits within the quality evaluation stage of a localization workflow, it is a retrospective tool applied after translation is complete. It complements quality estimation (QE), which predicts quality before human review, and linguistic QA (LQA), which is the broader process of checking translations for errors. MQM provides the structured scoring model that makes LQA results objective, consistent, and actionable. For a practical look at using this framework to validate AI output, see this piece published on Substack by our Lead AI Researcher, David Václavek.

Read more about the framework on the official website: https://themqm.org/

Curious about software localization beyond the terminology?

⚡ Manage your translations with Localazy! 🌍