Char can store Chinese characters, but only for commonly used characters; Rare characters must use String; The file encoding only affects reading and does not affect the storage capacity of char.
Can char save a Chinese character?
Commonly used Chinese characters: Sure! 99% of Chinese characters (such as' you 'and' Han ') are in the BMP plane of Unicode, directly char c=' you '; Just enough.
Rare word: Not allowed! A very small number of Chinese characters (such as' 𠀀 ') exceed BMP and require 2 char to form a proxy pair, which must be a String.
File encoding: UTF-8/GBK only affects the reading process. As long as it is decoded correctly (such as new String (bytes, "UTF-8")), char can store Chinese characters; If the encoding does not match, it will be garbled, but this is not a problem with char.
Summary:
Daily development: It is safe to directly store Chinese characters using char.
Handling rare characters: prioritize using String.
File operation: Encoding must be specified, otherwise garbled. ”
core principle
1. The essence of char
16 bit unsigned integer, fixed in UTF-16 encoding.
Can directly store characters in Unicode Basic Multilingual Plane (BMP, U+0000~U+FFFF):
Common Chinese character range: U+4E00~U+9FFF (such as char c='you'; ✅)。
Rare character range: U+10000 and above (e.g. String s="𠀀"); ✅, But char c='𠀀'; ❌ Compilation error).
2. The role of file encoding (UTF-8/GBK)
Only affects file read/write:
Correct reading: Use a matching encoding to decode (such as new String (bytes, "UTF-8")), where char stores Unicode characters.
Error reading: encoding mismatch → garbled (such as æ ²), but this is a data error, not a char capability issue.
3. Conversion from byte stream to char
Must be transferred through Unicode:
Error: Directly merging bytes (such as GBK's 0xC4 0xE3 → forcibly converting to char) ❌ Garbled code).
Correct: new String (bytes, charset) → toCharArrange() ✅ Safe.