为什么 NSRegularExpression 在标题引用时不匹配?

Why NSRegularExpression does not match when heading quote?

这是应该匹配的整个样本:

   let input = "L’iPhone XR serait un topselling (des prévisions de vente en hausse de 50% avant même sa sortie)"

   let pattern = "\b(iPhones?(\s*(se|X((s(\s*Max)?)|r)?|\d(s|c)?(\s*(Plus|Pro))?))?)\b"

   let regex: NSRegularExpression

   do {
        regex = try NSRegularExpression(pattern: pattern, options: [.caseInsensitive, .useUnicodeWordBoundaries])
    }
    catch let error {
        fatalError("pattern ”\(pattern)” has an issue. \(error.localizedDescription)")
    }

    let range = NSMakeRange(0, input.count)
    let matches = regex.matches(in: input, range: range)

目前正则表达式不捕获任何组。我期望它捕获 "iPhone XR" 作为第一组。

这是一个测试平台:https://regex101.com/r/aHcyPQ/2

.useUnicodeWordBoundaries 启用 UREGEX_UWORD 选项:

Controls the behavior of \b in a pattern. If set, word boundaries are found according to the definitions of word found in Unicode UAX 29, Text Boundaries. By default, word boundaries are identified by means of a simple classification of characters as either “word” or “non-word”, which approximates traditional regular expression behavior. The results obtained with the two options can be quite different in runs of spaces and other non-word characters.

Unicode UAX 29 文档详细描述了这些单词边界并提供了一些漂亮的插图。

被归类为 MidLetter 字符:

MidLetter  Any of the following:
                U+0027 (') APOSTROPHE
                U+00B7 (·) MIDDLE DOT
                U+05F4 (״) HEBREW PUNCTUATION GERSHAYIM
                U+2019 (’) RIGHT SINGLE QUOTATION MARK (curly apostrophe)
                U+2027 (‧) HYPHENATION POINT

因此,L’iPhoneLi之间没有Unicode字边界,删除.useUnicodeWordBoundaries