在 Android 中使用 Unicode

Use Unicode in Android

在使用 Android 应用程序时,我观察到 Unicode 是 API 响应的一部分:

{
"msg_txt":"Laurent Ruquier et l'\u00e9quipe"
}

在 Android 方面,我有一个简单的 TextView,我必须通过将 Unicode 转换为文本来显示实际文本。

由于响应中已有 unicode 字符,因此您需要对其进行解析。我不是重新发明轮子的朋友,所以你通常会使用类似 apache commons unescapeJava. If you decide to add it to your gradle build as a dependency make sure to correctly configure your R8 shrinking (see here) 的东西,否则你会添加大量的方法和 类 到你的发布版本中。这也会大大减慢调试构建速度,所以要小心。

虽然上面的方法实际上不仅仅是替换转义的 unicode 字符,但如果您只需要此功能并且不想为此代码添加 apache 依赖项,我们可以看一下它们的实现 here and here。我已经提取了相关部分并将其转换为 Kotlin 只是为了好玩:

val inputString = "Laurent Ruquier et l'\u00e9quipe"
val unescaped = translate(inputString)


//this is the main method that does the actual conversion on a character
// by character basis.
fun translate(input: CharSequence, index: Int, out: Writer): Int {
    if (input[index] == '\' && index + 1 < input.length && input[index + 1] == 'u') {
        // consume optional additional 'u' chars
        var i = 2
        while (index + i < input.length && input[index + i] == 'u') {
            i++
        }
        if (index + i < input.length && input[index + i] == '+') {
            i++
        }
        if (index + i + 4 <= input.length) {
            // Get 4 hex digits
            val unicode = input.subSequence(index + i, index + i + 4)
            try {
                val value = unicode.toString().toInt(16)
                out.write(value)
            } catch (nfe: NumberFormatException) {
                throw IllegalArgumentException("Unable to parse unicode value: $unicode", nfe)
            }
            return i + 4
        }
        throw IllegalArgumentException(
            "Less than 4 hex digits in unicode value: '"
                    + input.subSequence(index, input.length)
                    + "' due to end of CharSequence"
        )
    }
    return 0
}

//helper method for working directly with strings
fun translate(input: CharSequence): String {
    val writer = StringWriter(input.length * 2)
    translate(input, writer)
    return writer.toString()
}

// this goes through the actual char sequence and passes every
// single char to the unicode transformer and swallows consumed chars 
fun translate(input: CharSequence, out: Writer) {
    var pos = 0
    val len = input.length
    while (pos < len) {
        val consumed = translate(input, pos, out)
        if (consumed == 0) {
            // inlined implementation of Character.toChars(Character.codePointAt(input, pos))
            // avoids allocating temp char arrays and duplicate checks
            val c1 = input[pos]
            out.write(c1.toInt())
            pos++
            if (Character.isHighSurrogate(c1) && pos < len) {
                val c2 = input[pos]
                if (Character.isLowSurrogate(c2)) {
                    out.write(c2.toInt())
                    pos++
                }
            }
            continue
        }
        // contract with translators is that they have to understand codepoints
        // and they just took care of a surrogate pair
        for (pt in 0 until consumed) {
            pos += Character.charCount(Character.codePointAt(input, pos))
        }
    }
}

请注意,这实际上只是骨架,可能需要针对实际生产使用进行一些调整。如果例如您的输入将包含两个反斜杠(如 val inputString = "Laurent Ruquier et l'\\u00e9quipe"),您需要稍微修改一下方法。