在 Android 中使用 Unicode
Use Unicode in Android
在使用 Android 应用程序时,我观察到 Unicode 是 API 响应的一部分:
{
"msg_txt":"Laurent Ruquier et l'\u00e9quipe"
}
在 Android 方面,我有一个简单的 TextView
,我必须通过将 Unicode 转换为文本来显示实际文本。
由于响应中已有 unicode 字符,因此您需要对其进行解析。我不是重新发明轮子的朋友,所以你通常会使用类似 apache commons unescapeJava. If you decide to add it to your gradle build as a dependency make sure to correctly configure your R8 shrinking (see here) 的东西,否则你会添加大量的方法和 类 到你的发布版本中。这也会大大减慢调试构建速度,所以要小心。
虽然上面的方法实际上不仅仅是替换转义的 unicode 字符,但如果您只需要此功能并且不想为此代码添加 apache 依赖项,我们可以看一下它们的实现 here and here。我已经提取了相关部分并将其转换为 Kotlin 只是为了好玩:
val inputString = "Laurent Ruquier et l'\u00e9quipe"
val unescaped = translate(inputString)
//this is the main method that does the actual conversion on a character
// by character basis.
fun translate(input: CharSequence, index: Int, out: Writer): Int {
if (input[index] == '\' && index + 1 < input.length && input[index + 1] == 'u') {
// consume optional additional 'u' chars
var i = 2
while (index + i < input.length && input[index + i] == 'u') {
i++
}
if (index + i < input.length && input[index + i] == '+') {
i++
}
if (index + i + 4 <= input.length) {
// Get 4 hex digits
val unicode = input.subSequence(index + i, index + i + 4)
try {
val value = unicode.toString().toInt(16)
out.write(value)
} catch (nfe: NumberFormatException) {
throw IllegalArgumentException("Unable to parse unicode value: $unicode", nfe)
}
return i + 4
}
throw IllegalArgumentException(
"Less than 4 hex digits in unicode value: '"
+ input.subSequence(index, input.length)
+ "' due to end of CharSequence"
)
}
return 0
}
//helper method for working directly with strings
fun translate(input: CharSequence): String {
val writer = StringWriter(input.length * 2)
translate(input, writer)
return writer.toString()
}
// this goes through the actual char sequence and passes every
// single char to the unicode transformer and swallows consumed chars
fun translate(input: CharSequence, out: Writer) {
var pos = 0
val len = input.length
while (pos < len) {
val consumed = translate(input, pos, out)
if (consumed == 0) {
// inlined implementation of Character.toChars(Character.codePointAt(input, pos))
// avoids allocating temp char arrays and duplicate checks
val c1 = input[pos]
out.write(c1.toInt())
pos++
if (Character.isHighSurrogate(c1) && pos < len) {
val c2 = input[pos]
if (Character.isLowSurrogate(c2)) {
out.write(c2.toInt())
pos++
}
}
continue
}
// contract with translators is that they have to understand codepoints
// and they just took care of a surrogate pair
for (pt in 0 until consumed) {
pos += Character.charCount(Character.codePointAt(input, pos))
}
}
}
请注意,这实际上只是骨架,可能需要针对实际生产使用进行一些调整。如果例如您的输入将包含两个反斜杠(如 val inputString = "Laurent Ruquier et l'\\u00e9quipe"
),您需要稍微修改一下方法。
在使用 Android 应用程序时,我观察到 Unicode 是 API 响应的一部分:
{
"msg_txt":"Laurent Ruquier et l'\u00e9quipe"
}
在 Android 方面,我有一个简单的 TextView
,我必须通过将 Unicode 转换为文本来显示实际文本。
由于响应中已有 unicode 字符,因此您需要对其进行解析。我不是重新发明轮子的朋友,所以你通常会使用类似 apache commons unescapeJava. If you decide to add it to your gradle build as a dependency make sure to correctly configure your R8 shrinking (see here) 的东西,否则你会添加大量的方法和 类 到你的发布版本中。这也会大大减慢调试构建速度,所以要小心。
虽然上面的方法实际上不仅仅是替换转义的 unicode 字符,但如果您只需要此功能并且不想为此代码添加 apache 依赖项,我们可以看一下它们的实现 here and here。我已经提取了相关部分并将其转换为 Kotlin 只是为了好玩:
val inputString = "Laurent Ruquier et l'\u00e9quipe"
val unescaped = translate(inputString)
//this is the main method that does the actual conversion on a character
// by character basis.
fun translate(input: CharSequence, index: Int, out: Writer): Int {
if (input[index] == '\' && index + 1 < input.length && input[index + 1] == 'u') {
// consume optional additional 'u' chars
var i = 2
while (index + i < input.length && input[index + i] == 'u') {
i++
}
if (index + i < input.length && input[index + i] == '+') {
i++
}
if (index + i + 4 <= input.length) {
// Get 4 hex digits
val unicode = input.subSequence(index + i, index + i + 4)
try {
val value = unicode.toString().toInt(16)
out.write(value)
} catch (nfe: NumberFormatException) {
throw IllegalArgumentException("Unable to parse unicode value: $unicode", nfe)
}
return i + 4
}
throw IllegalArgumentException(
"Less than 4 hex digits in unicode value: '"
+ input.subSequence(index, input.length)
+ "' due to end of CharSequence"
)
}
return 0
}
//helper method for working directly with strings
fun translate(input: CharSequence): String {
val writer = StringWriter(input.length * 2)
translate(input, writer)
return writer.toString()
}
// this goes through the actual char sequence and passes every
// single char to the unicode transformer and swallows consumed chars
fun translate(input: CharSequence, out: Writer) {
var pos = 0
val len = input.length
while (pos < len) {
val consumed = translate(input, pos, out)
if (consumed == 0) {
// inlined implementation of Character.toChars(Character.codePointAt(input, pos))
// avoids allocating temp char arrays and duplicate checks
val c1 = input[pos]
out.write(c1.toInt())
pos++
if (Character.isHighSurrogate(c1) && pos < len) {
val c2 = input[pos]
if (Character.isLowSurrogate(c2)) {
out.write(c2.toInt())
pos++
}
}
continue
}
// contract with translators is that they have to understand codepoints
// and they just took care of a surrogate pair
for (pt in 0 until consumed) {
pos += Character.charCount(Character.codePointAt(input, pos))
}
}
}
请注意,这实际上只是骨架,可能需要针对实际生产使用进行一些调整。如果例如您的输入将包含两个反斜杠(如 val inputString = "Laurent Ruquier et l'\\u00e9quipe"
),您需要稍微修改一下方法。