如何在 Java 中的 UTF-8 和原生字符串之间进行转换?
How to convert between UTF-8 and native String in Java?
Just like the picture, I'd like to convert between the encoded UTF-8 String and Native String in Java.
Would anyone some suggestions?非常感谢!
ps。
例如,
String a = "这是一个例子,this is a example";
String b = null;
// block A: processing a, and let b = "这是一个例子,this is a example"
How to implement the "block A"?
您可以使用字符集。请参阅文档 here
Charset.forName("UTF-8").encode(text)
或
你也可以使用'java.lang.String'的getBytes()
方法Class
text.getBytes(Charset.forName("UTF-8"));
文档:
public byte[] getBytes(Charset charset)
Encodes this String into a sequence of bytes using the given charset,
storing the result into a
new byte array.
This method always replaces malformed-input and unmappable-character
sequences with this charset's default replacement byte array. The
CharsetEncoder class should be used when more control over the
encoding process is required.
Parameters: charset - The Charset to be used to encode the String
Returns: The resultant byte array
Since:
1.6
Apache Commons Lang StringEscapeUtils.unescapeXml(...)
就是你想要的。根据原始字符串的来源,HTML 变体之一可能更合适。
这样使用:
String a = "这是一个例子,this is a example";
String b = StringEscapeUtils.unescapeXml(a);
// block A: processing a, and let b = "这是一个例子,this is a example"
System.out.println(a);
System.out.println(b);
输出:
这是一个例子,this is a example
这是一个例子,this is a example
也有转换其他方式的方法。
右边是十六进制数字 HTML 个实体。
现在 apache commons 库有一个 StringEscapeUtils,它可以将 从 转换为 String,但反过来并不明显(= 应该尝试,可能会给出命名实体)。
public static void main(String[] args) throws InterruptedException {
String a = "这是一个例子,this is a example";
String b = fromHtmlEntities(a);
System.out.println(b);
String a2 = toHtmlEntities(b);
System.out.println(a2.equals(a));
System.out.println(a);
System.out.println(a2);
}
public static String fromHtmlEntities(String s) {
Pattern numericEntityPattern = Pattern.compile("\&#[Xx]([0-9A-Fa-f]{1,6});");
Matcher m = numericEntityPattern.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
int codePoint = Integer.parseInt(m.group(1), 16);
String replacement = new String(new int[] { codePoint }, 0, 1);
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
// Uses java 8
public static String toHtmlEntities(String s) {
int[] codePoints = s.codePoints().flatMap(
(cp) -> cp < 128 // ASCII?
? IntStream.of(cp)
: String.format("&#x%X;", cp).codePoints())
.toArray();
return new String(codePoints, 0, codePoints.length);
}
Just like the picture, I'd like to convert between the encoded UTF-8 String and Native String in Java. Would anyone some suggestions?非常感谢!
ps。 例如,
String a = "这是一个例子,this is a example";
String b = null;
// block A: processing a, and let b = "这是一个例子,this is a example"
How to implement the "block A"?
您可以使用字符集。请参阅文档 here
Charset.forName("UTF-8").encode(text)
或
你也可以使用'java.lang.String'的getBytes()
方法Class
text.getBytes(Charset.forName("UTF-8"));
文档:
public byte[] getBytes(Charset charset)
Encodes this String into a sequence of bytes using the given charset, storing the result into a new byte array.This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array. The CharsetEncoder class should be used when more control over the encoding process is required.
Parameters: charset - The Charset to be used to encode the String
Returns: The resultant byte array
Since:
1.6
Apache Commons Lang StringEscapeUtils.unescapeXml(...)
就是你想要的。根据原始字符串的来源,HTML 变体之一可能更合适。
这样使用:
String a = "这是一个例子,this is a example";
String b = StringEscapeUtils.unescapeXml(a);
// block A: processing a, and let b = "这是一个例子,this is a example"
System.out.println(a);
System.out.println(b);
输出:
这是一个例子,this is a example
这是一个例子,this is a example
也有转换其他方式的方法。
右边是十六进制数字 HTML 个实体。
现在 apache commons 库有一个 StringEscapeUtils,它可以将 从 转换为 String,但反过来并不明显(= 应该尝试,可能会给出命名实体)。
public static void main(String[] args) throws InterruptedException {
String a = "这是一个例子,this is a example";
String b = fromHtmlEntities(a);
System.out.println(b);
String a2 = toHtmlEntities(b);
System.out.println(a2.equals(a));
System.out.println(a);
System.out.println(a2);
}
public static String fromHtmlEntities(String s) {
Pattern numericEntityPattern = Pattern.compile("\&#[Xx]([0-9A-Fa-f]{1,6});");
Matcher m = numericEntityPattern.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
int codePoint = Integer.parseInt(m.group(1), 16);
String replacement = new String(new int[] { codePoint }, 0, 1);
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
// Uses java 8
public static String toHtmlEntities(String s) {
int[] codePoints = s.codePoints().flatMap(
(cp) -> cp < 128 // ASCII?
? IntStream.of(cp)
: String.format("&#x%X;", cp).codePoints())
.toArray();
return new String(codePoints, 0, codePoints.length);
}