Single/Double-Byte Character Set (SBCS/DBCS)

A character encoding classification that determines how many bytes represent a single character, directly affecting how software stores, processes, and displays text across different languages and scripts.

A single-byte character set (SBCS) uses one byte (eight bits) to represent each character. With one byte, a maximum of 256 unique characters can be encoded. This is sufficient for Western European languages that use the Latin alphabet, where the total number of required characters, including punctuation and numerals, fits comfortably within that limit. ASCII, the most widely known SBCS, covers 128 characters and forms the foundation of most Western text encoding.

A double-byte character set (DBCS) uses two bytes per character, allowing up to 65,536 unique characters to be represented. This capacity is essential for East Asian languages (Chinese, Japanese, and Korean, collectively referred to as CJK) which require thousands of unique ideographic characters that simply cannot fit within a single-byte range. Common DBCS encodings include Shift-JIS for Japanese and Big5 for Traditional Chinese.

🌏 Why this matters for localization #️⃣

The distinction between SBCS and DBCS has direct consequences for software internationalization. A system designed only for single-byte encoding will misinterpret or corrupt double-byte characters. String length calculations break down, what a developer assumes is a one-character string may occupy two bytes, causing truncation, buffer overflows, or display errors. UI layouts built around SBCS text widths often fail to accommodate the wider display footprint of DBCS characters.

For localization engineers, SBCS/DBCS awareness is especially relevant when working on legacy systems, mainframe environments, or software originally built for Western markets that is being adapted for CJK audiences.

🔤 Key points about single and double-byte character sets #️⃣

  • Many systems use mixed encoding, standard alphanumeric characters stored as single-byte, while CJK characters occupy two bytes in the same data stream. This is sometimes called MBCS (multi-byte character set).
  • DBCS should not be confused with Unicode. Unicode is a separate, modern standard designed to encode all the world’s writing systems in a consistent way. UTF-8, the most common Unicode encoding, uses variable-length encoding, one to four bytes per character, and has largely replaced both SBCS and DBCS in modern software development.
  • Legacy applications still using SBCS or DBCS encoding require careful handling during localization. File format converters, string parsers, and display components all need to be encoding-aware.
  • DBCS characters typically display at twice the width of SBCS characters on legacy terminals, which affects UI layout during localization, especially text expansion calculations for CJK languages.
  • Testing with actual CJK content is essential when localizing software for DBCS languages. English-only testing will not surface encoding errors that only appear when double-byte characters are present.

🔄 SBCS, DBCS, and the move to Unicode #️⃣

Modern software development has largely moved away from SBCS and DBCS in favor of Unicode, which provides a unified encoding space for all languages. UTF-8 in particular has become the default encoding for web content and most software platforms. However, understanding SBCS and DBCS remains relevant for localization professionals working with legacy systems, older file formats, and codebases that predate widespread Unicode adoption.

Type Bytes per Char Max Characters Best For
SBCS 1 256 English, Spanish, French
DBCS 1 or 2 ~65,000 Legacy CJK (Chinese, Japanese, Korean)
Unicode (UTF-8) 1 to 4 1,114,112 Everything (Modern Standard)

🛠️ How Localazy handles character encoding #️⃣

Localazy removes the “encoding headache” by standardized on UTF-8 across its entire infrastructure. When you upload legacy files (like old Java .properties files or Windows-1252 strings), Localazy’s Format Conversions automatically handle the transformation to a modern Unicode space.

For localization engineers, this means you don’t have to worry about buffer overflows or “mojibake” (corrupted text) when moving between Western and CJK markets. If you are exporting back to a legacy environment that requires a specific SBCS or DBCS encoding, Localazy’s CLI and API allow you to define the output encoding, ensuring the characters are re-mapped correctly without breaking the target system’s logic.

Learn more about supported file formats and CLI configuration in the Localazy docs.

Curious about software localization beyond the terminology?

⚡ Manage your translations with Localazy! 🌍