A character encoding classification that determines how many bytes represent a single character, directly affecting how software stores, processes, and displays text across different languages and scripts.
A single-byte character set (SBCS) uses one byte (eight bits) to represent each character. With one byte, a maximum of 256 unique characters can be encoded. This is sufficient for Western European languages that use the Latin alphabet, where the total number of required characters, including punctuation and numerals, fits comfortably within that limit. ASCII, the most widely known SBCS, covers 128 characters and forms the foundation of most Western text encoding.
A double-byte character set (DBCS) uses two bytes per character, allowing up to 65,536 unique characters to be represented. This capacity is essential for East Asian languages (Chinese, Japanese, and Korean, collectively referred to as CJK) which require thousands of unique ideographic characters that simply cannot fit within a single-byte range. Common DBCS encodings include Shift-JIS for Japanese and Big5 for Traditional Chinese.
The distinction between SBCS and DBCS has direct consequences for software internationalization. A system designed only for single-byte encoding will misinterpret or corrupt double-byte characters. String length calculations break down, what a developer assumes is a one-character string may occupy two bytes, causing truncation, buffer overflows, or display errors. UI layouts built around SBCS text widths often fail to accommodate the wider display footprint of DBCS characters.
For localization engineers, SBCS/DBCS awareness is especially relevant when working on legacy systems, mainframe environments, or software originally built for Western markets that is being adapted for CJK audiences.
Modern software development has largely moved away from SBCS and DBCS in favor of Unicode, which provides a unified encoding space for all languages. UTF-8 in particular has become the default encoding for web content and most software platforms. However, understanding SBCS and DBCS remains relevant for localization professionals working with legacy systems, older file formats, and codebases that predate widespread Unicode adoption.
| Type | Bytes per Char | Max Characters | Best For |
| SBCS | 1 | 256 | English, Spanish, French |
| DBCS | 1 or 2 | ~65,000 | Legacy CJK (Chinese, Japanese, Korean) |
| Unicode (UTF-8) | 1 to 4 | 1,114,112 | Everything (Modern Standard) |
Localazy removes the “encoding headache” by standardized on UTF-8 across its entire infrastructure. When you upload legacy files (like old Java .properties files or Windows-1252 strings), Localazy’s Format Conversions automatically handle the transformation to a modern Unicode space.
For localization engineers, this means you don’t have to worry about buffer overflows or “mojibake” (corrupted text) when moving between Western and CJK markets. If you are exporting back to a legacy environment that requires a specific SBCS or DBCS encoding, Localazy’s CLI and API allow you to define the output encoding, ensuring the characters are re-mapped correctly without breaking the target system’s logic.
Learn more about supported file formats and CLI configuration in the Localazy docs.