String normalization

A text processing practice that standardizes source strings so translation tools can reliably match and reuse existing translations.

String normalization is the practice of converting source text into a consistent format before or during the translation workflow. It removes formatting differences such as whitespace, quotation styles, punctuation, or line breaks that prevent translation memory systems from matching identical strings. This improves translation memory reuse and avoids duplicated translation work.

When content comes from multiple authors, content systems, or legacy code, small formatting differences can cause the same text to be treated as separate strings.

For example, “Click here” and “Click here” would be translated twice without normalization

🔎 Common string normalization practices: #️⃣

  • Standardizing whitespace by removing extra spaces, tabs, or line breaks
  • Converting between straight quotes and smart quotes consistently
  • Normalizing Unicode characters like different dash types, apostrophes, or special symbols
  • Removing or standardizing inline formatting tags and markup
  • Handling capitalization patterns consistently across source content
  • Converting special characters or entities to standard representations

🚦Limits of string normalization #️⃣

There is no single standard for what should be normalized. Rules vary by content type, tooling, and quality needs. Formatting that is safe to normalize in marketing copy may change meaning in technical or UI content where whitespace or symbols matter. Teams define normalization rules carefully to improve translation memory matching without altering meaning.

📚 Read more about Translation Memory in Localazy and how source consistency affects reuse

Curious about software localization beyond the terminology?

⚡ Manage your translations with Localazy! 🌍