Web

Can char type variables in Java store a Chinese character and why

Published Time : 2025-10-17

one-sentence summary

Char can store Chinese characters, but only for commonly used characters; Rare characters must use String; The file encoding only affects reading and does not affect the storage capacity of char.

Interview Template

Can char save a Chinese character?

Commonly used Chinese characters: Sure! 99% of Chinese characters (such as' you 'and' Han ') are in the BMP plane of Unicode, directly char c=' you '; Just enough.

Rare word: Not allowed! A very small number of Chinese characters (such as' 𠀀 ') exceed BMP and require 2 char to form a proxy pair, which must be a String.

File encoding: UTF-8/GBK only affects the reading process. As long as it is decoded correctly (such as new String (bytes, "UTF-8")), char can store Chinese characters; If the encoding does not match, it will be garbled, but this is not a problem with char.

Summary:

Daily development: It is safe to directly store Chinese characters using char.

Handling rare characters: prioritize using String.

File operation: Encoding must be specified, otherwise garbled. ”

core principle

1. The essence of char

16 bit unsigned integer, fixed in UTF-16 encoding.

Can directly store characters in Unicode Basic Multilingual Plane (BMP, U+0000~U+FFFF):

Common Chinese character range: U+4E00~U+9FFF (such as char c='you';   ✅)。

Rare character range: U+10000 and above (e.g. String s="𠀀");   ✅, But char c='𠀀';   ❌  Compilation error).

2. The role of file encoding (UTF-8/GBK)

Only affects file read/write:

Correct reading: Use a matching encoding to decode (such as new String (bytes, "UTF-8")), where char stores Unicode characters.

Error reading: encoding mismatch → garbled (such as æ ²), but this is a data error, not a char capability issue.

3. Conversion from byte stream to char

Must be transferred through Unicode:

Error: Directly merging bytes (such as GBK's 0xC4 0xE3 → forcibly converting to char) ❌  Garbled code).

Correct: new String (bytes, charset) → toCharArrange() ✅  Safe.