在 iOS 上不一致地处理表情符号序列?

Inconsistently handled emoji sequences on iOS?

在 iOS 和 macOS 上,regional indicator symbols 序列呈现为国旗表情符号,如果序列无效,则显示实际符号:

但是,如果序列恰好包含一对未映射到旗帜表情符号的区域指示符号,则潜在旗帜将在首次匹配的基础上呈现:

iOS/macOS 渲染符号:F F I S E S.

在Swift 3中,连续的区域指标符号全部集中在一个Character中,这意味着一个Character对象可以包含理论上无限数量的UnicodeScalar对象, 只要它们都是区域指标符号。本质上,Swift 3 根本没有破坏区域指标符号。

另一方面,在 Swift4 中,一个 Character 对象在其 Unicode 标量表示中最多包含两个区域指示符号。此外,可以理解的是,不考虑序列的有效性,因此区域指示符符号序列被简单地按每两个标量分解并视为 Character。现在,迭代与上面相同的字符串并打印每个字符会产生以下结果:

Swift 4个字符串包含的符号:F F I S E S.

这给我们带来了真正的问题——iOS 和 macOS 如何呈现序列的问题,或者 Swift 4 如何构造字符串中的 Character 表示的问题?

我很好奇哪一方最适合向其报告此异常情况。


这是 Swift 4:

中行为的最小可重现片段
// Regional indicator symbols F F I S E S
var string = "\u{1f1eb}\u{1f1eb}\u{1f1ee}\u{1f1f8}\u{1f1ea}\u{1f1f8}"

for character in string {
    print(character)
}

经过一些调查,似乎都没有错,尽管 Swift 4 中实施的方法更符合建议。

根据 Unicode 标准(强调我的):

The representative glyph for a single regional indicator symbol is just a dotted box containing a capital Latin letter. The Unicode Standard does not prescribe how the pairs of regional indicator symbols should be rendered. However, current industry practice widely interprets pairs of regional indicator symbols as representing a flag associated with the corresponding ISO 3166 region code.

The Unicode Standard, Version 10.0 – Core Specification, page 836.

然后,在下一页:

Conformance to the Unicode Standard does not require conformance to UTS #51. However, the interpretation and display of pairs of regional indicator symbols as specified in UTS #51 is now widely deployed, so in practice it is not advisable to attempt to interpret pairs of regional indicator symbols as representing anything other than an emoji flag.

– The Unicode Standard, Version 10.0 – Core Specification, page 837.

据我所知,虽然标准没有为标志的呈现方式设置任何规则,但在 iOS 和 macOS 中选择用于处理无效标志序列呈现的路径是不可取的。因此,即使序列中进一步存在有效标志,渲染器也应始终将两个连续的区域指示器符号视为标志。

最后,看看 UTS #51,或“表情符号规范”:

Options for presenting an emoji_flag_sequence for which a system does not have a specific flag or other glyph include:

  • Displaying each REGIONAL INDICATOR symbol separately as a letter in a dotted square, as shown in the Unicode charts. This provides information about the specific region indicated, but may be mystifying to some users.

  • For all unsupported REGIONAL INDICATOR pairs, displaying the same “missing flag” glyph, such as the image shown below. This would indicate that the supported pair was intended to represent the flag of some region, without indicating which one.

Unicode Technical Standard #51, revision 12, 附件 B.

因此,总而言之,最佳做法是将无效标志序列表示为一对区域指示符符号——与 Swift 4 个字符串中的 Character 个对象的情况完全一样——或表示为通用 缺少标志 字形。