Swift 正则表达式特殊字符

Question

我目前正在使用正则表达式查找我的应用程序之前创建的一些文件，但我在使用德语 "Umlauten" 时遇到问题，例如 ö,ä,ü。如果字符串中有 "Umlaute"，我的表达式不匹配。我想这与语言环境有关，但我不知道要设置什么语言环境（已经尝试过 nil）。

这是一些代码：

// Building the regex
var regex = somePrefix + "_("
for string in stringArray{     
     regex += string + "|"  // string can contain öäü
}
regex.remove(at: regex.index(before: regex.endIndex))
regex += ")_w\d_d\d"

// Finding files
let fileManager = FileManager()
let files = fileManager.enumerator(atPath: somePath)
while let file = files?.nextObject() {
   let fileName = file as! String            
   if fileName.range(of: regex, options: .regularExpression, range: nil, locale: Locale.current) != nil {
   print(fileName + " found")
   }
}

// 一些不匹配的例子：

Regex = reis 8_(Ibedir|Drölf )_w\d_d\d
Filename that didnt match = reis 8_Drölf _w0_d0.plist

Answer 1

显然文件名使用了与之前不同的 Unicode 规范化形式给定的字符串。 Unicode Regular Expression Guidelines: 3.2 Canonical Equivalents建议：

Before (or during) processing, translate text (and pattern) into a normalized form. This is the simplest to implement, since there are available code libraries for doing normalization.

这可以通过应用 .decomposedStringWithCanonicalMapping 来实现模式和文件名。

Swift 正则表达式特殊字符

Swift Regular Expression Special Characters

regex

locale

file-handling

swift