What is Quality Estimation (QE) | Localazy Dictionary

Quality Estimation (QE)

A form of automated MT quality assessment that predicts the quality of machine translation output without requiring a human reference translation.

Commonly written as QE and often referred to as MTQE (Machine Translation Quality Estimation), quality estimation uses machine learning models to score translated segments in real time. Unlike evaluation metrics such as BLEU or COMET, which compare MT output against a human reference, QE works on new content where no reference exists yet. The model analyzes the source and target text together and produces a score indicating how likely the translation is to be accurate, fluent, and ready for use.

Scores are typically expressed on a scale of 0 to 100, though some systems return categorical labels such as Good, Fair, or Poor. The underlying models are trained on large datasets of machine-translated content that has been reviewed and corrected by human translators, so the estimations reflect patterns learned from real post-editing behavior rather than abstract linguistic rules.

🤨 How is QE used in localization workflows #️⃣

The primary use case is intelligent routing, sometimes called hybrid post-editing. Instead of sending every MT segment to a human reviewer regardless of quality, teams set score thresholds that determine what happens to each segment automatically:

High scores (typically 85+) — auto-approved and published directly
Mid-range scores (typically 70–84) — routed to a translator for light review
Low scores (below 70) — flagged for full human editing or retranslation

This approach means human effort is concentrated where it adds the most value. Routine, predictable content moves through without review, while complex or uncertain segments get the attention they need. Teams that have implemented QE-driven workflows have reported significant reductions in post-editing volume and cost.

QE also helps with MT engine selection. Running candidate engines on a sample of real project content and comparing QE scores across segments provides a more practical signal than generic benchmark comparisons.

🔢 Key points about Quality Estimation #️⃣

QE works without reference translations, making it practical for new or rapidly updated content.
Scores operate at the segment level most commonly, but some systems aggregate to paragraph or document level.
Thresholds should be calibrated per content type. An 85+ threshold suitable for product UI strings may be too low for legal or medical content.
Some TMS platforms build QE natively into their workflows, including Phrase (QPS), Smartling (Quality Confidence Score™️), and ModernMT (T-QE). Others integrate external QE engines like ModelFront or TAUS.
QE models can be white‑box, meaning they are built into the MT engine itself and use internal signals, or black‑box, meaning they are independent of the MT system and can work with any engine.

Limitations to know #️⃣

QE scores can be skewed when the MT engine and the QE model are trained on the same data. In that case, the estimator tends to rate the engine’s output more favorably than an independent model would, potentially allowing errors to slip through with high confidence scores. Teams should treat unusually high average scores as a signal to audit, not as confirmation of quality.

QE also cannot catch errors that require domain knowledge, cultural context, or knowing the brand voice. A sentence can be grammatically correct, semantically close to the source, and still be wrong for a specific audience. Automated scoring is a triage tool, not a replacement for human review on high-stakes content.

Relationship to BLEU and COMET #️⃣

BLEU and COMET are evaluation metrics: they measure quality after the fact by comparing MT output to human references. QE is a prediction mechanism: it estimates quality before any human review takes place. In practice, teams use all three at different stages: BLEU and COMET for MT engine benchmarking, QE for live production workflows.

Quality Estimation (QE)

🤨 How is QE used in localization workflows #️⃣

🔢 Key points about Quality Estimation #️⃣

Limitations to know #️⃣

Relationship to BLEU and COMET #️⃣

Related terms

⚡ Manage your translations with Localazy! 🌍