Java 中一个字符是两个字节?
One Char is Two Bytes in Java?
在java中,一个字符是两个字节。
但是,为什么下面的代码 return 2,而不是 4?
public static void main(String[] args) {
byte[] b = new String(new char[] {'H', 'I' }).getBytes();
System.out.println(b.length);
}
getBytes()
编码JVM默认的Charset
中的Unicode字符,通常是ISO-8859-1
或UTF-8
,两者都使用一个字节来存储这些字符.
这段代码应该有助于说明发生了什么:
public static void main(String[] args) throws Exception {
test("ISO-8859-1", new char[] { 'H', 'I' });
test("UTF-8" , new char[] { 'H', 'I' });
test("UTF-16LE" , new char[] { 'H', 'I' });
test("UTF-32LE" , new char[] { 'H', 'I' });
test("ISO-8859-1", new char[] { '⅓', '⅔' });
test("UTF-8" , new char[] { '⅓', '⅔' });
test("UTF-16LE" , new char[] { '⅓', '⅔' });
test("UTF-32LE" , new char[] { '⅓', '⅔' });
test("UTF-8" , "");
test("UTF-16LE" , "");
test("UTF-32LE" , "");
}
static void test(String charsetName, char[] chars) throws Exception {
test(charsetName, new String(chars));
}
static void test(String charsetName, String input) throws Exception {
byte[] bytes = input.getBytes(charsetName);
System.out.printf("%-12s %-6s", charsetName, new String(bytes, charsetName));
for (byte b : bytes)
System.out.printf(" %02x", b);
System.out.println();
}
输出
ISO-8859-1 HI 48 49
UTF-8 HI 48 49
UTF-16LE HI 48 00 49 00
UTF-32LE HI 48 00 00 00 49 00 00 00
ISO-8859-1 ?? 3f 3f
UTF-8 ⅓⅔ e2 85 93 e2 85 94
UTF-16LE ⅓⅔ 53 21 54 21
UTF-32LE ⅓⅔ 53 21 00 00 54 21 00 00
UTF-8 f0 9f 98 80 f0 9f 91 8d
UTF-16LE 3d d8 00 de 3d d8 4d dc
UTF-32LE 00 f6 01 00 4d f4 01 00
在java中,一个字符是两个字节。
但是,为什么下面的代码 return 2,而不是 4?
public static void main(String[] args) {
byte[] b = new String(new char[] {'H', 'I' }).getBytes();
System.out.println(b.length);
}
getBytes()
编码JVM默认的Charset
中的Unicode字符,通常是ISO-8859-1
或UTF-8
,两者都使用一个字节来存储这些字符.
这段代码应该有助于说明发生了什么:
public static void main(String[] args) throws Exception {
test("ISO-8859-1", new char[] { 'H', 'I' });
test("UTF-8" , new char[] { 'H', 'I' });
test("UTF-16LE" , new char[] { 'H', 'I' });
test("UTF-32LE" , new char[] { 'H', 'I' });
test("ISO-8859-1", new char[] { '⅓', '⅔' });
test("UTF-8" , new char[] { '⅓', '⅔' });
test("UTF-16LE" , new char[] { '⅓', '⅔' });
test("UTF-32LE" , new char[] { '⅓', '⅔' });
test("UTF-8" , "");
test("UTF-16LE" , "");
test("UTF-32LE" , "");
}
static void test(String charsetName, char[] chars) throws Exception {
test(charsetName, new String(chars));
}
static void test(String charsetName, String input) throws Exception {
byte[] bytes = input.getBytes(charsetName);
System.out.printf("%-12s %-6s", charsetName, new String(bytes, charsetName));
for (byte b : bytes)
System.out.printf(" %02x", b);
System.out.println();
}
输出
ISO-8859-1 HI 48 49
UTF-8 HI 48 49
UTF-16LE HI 48 00 49 00
UTF-32LE HI 48 00 00 00 49 00 00 00
ISO-8859-1 ?? 3f 3f
UTF-8 ⅓⅔ e2 85 93 e2 85 94
UTF-16LE ⅓⅔ 53 21 54 21
UTF-32LE ⅓⅔ 53 21 00 00 54 21 00 00
UTF-8 f0 9f 98 80 f0 9f 91 8d
UTF-16LE 3d d8 00 de 3d d8 4d dc
UTF-32LE 00 f6 01 00 4d f4 01 00