Quality Evaluation (QE)

A human-led assessment of completed translations measured against reference translations or a structured error framework to verify accuracy, fluency, and consistency.

Quality evaluation is a retrospective process. It happens after translation is complete, when a human reviewer (typically a linguist, editor, or subject matter expert) examines the translated content and scores it against a defined set of criteria. The result is an objective, documented assessment of translation quality that informs decisions about whether content is ready to publish, needs revision, or reveals systemic issues with a translation workflow.

This distinguishes it from quality estimation (QE), which is automated and predictive, generating scores before or during translation without human input. Quality evaluation is slower and more resource-intensive, but it captures what automated tools cannot: nuance, cultural appropriateness, brand voice, and contextual accuracy.

🔍 How quality evaluation works in practice #️⃣

Most structured quality evaluation follows an established error framework. The most widely adopted in the localization industry is MQM (Multidimensional Quality Metrics), which categorizes errors by type accuracy, fluency, terminology, style, locale conventions, and assigns severity levels: neutral, minor, major, and critical. Each error type and severity carries a numerical weight, and the resulting score indicates overall translation quality against a defined threshold.

A common workflow: a sample of translated segments is selected (typically around 10% of the total project) and reviewed by a qualified linguist. Errors are logged, categorized, and scored. The overall quality metric score is compared against a threshold, such as 85 out of 100, to determine whether the translation passes or requires rework. Results are documented in a quality report that can be referenced for future projects, translator feedback, and workflow improvements.

📊 Key points about quality evaluation #️⃣

  • Quality evaluation requires reference translations, human-verified versions of the source text that the evaluated translation is measured against. This is one reason it is resource-intensive and typically applied to samples rather than entire projects.
  • The MQM framework is the current industry standard for structured quality evaluation. Earlier frameworks like LISA QA and SAE J2450 preceded it, but MQM is now the most widely used benchmark across LSPs and enterprise localization teams.
  • Quality evaluation results are most valuable when tracked over time. Recurring error patterns across languages, translators, or content types reveal systemic weaknesses that can be addressed through training, glossary updates, or workflow changes.
  • Quality evaluation applies to all translation types (human translation, MT output, and MTPE) making it a universal quality check regardless of how the translation was produced.
  • Sampling is standard practice. Reviewing 100% of translated content at the quality evaluation stage is rarely practical at scale. Representative sampling, when done consistently, provides a reliable signal about overall project quality.

🔄 Quality evaluation vs. quality estimation #️⃣

These two terms share an abbreviation (QE) and are closely related but serve different purposes in the workflow:

Quality evaluation Quality estimation
When After translation Before or during translation
How Human review Automated / ML-based
Requires reference Yes No
Speed Slower, resource-intensive Fast, scalable
Best for Final quality assessment Triage and routing decisions

In practice, they are used together: quality estimation to route segments efficiently during production, and quality evaluation to validate the final output and continuously improve the translation platform.

Curious about software localization beyond the terminology?

⚡ Manage your translations with Localazy! 🌍