componentsseparatedbystring 由 Swift 中的多个分隔符
componentsseparatedbystring by multiple separators in Swift
所以这里是字符串 s
:
"Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
我希望将它们分隔成一个数组:
["Hi", "How are you", "I'm fine", "It is 6 p.m", "Thank you", "That's it"]
这意味着分隔符应该是 ". "
+ "? "
+ "! "
我试过:
let charSet = NSCharacterSet(charactersInString: ".?!")
let array = s.componentsSeparatedByCharactersInSet(charSet)
但它也会将 p.m.
分成两个元素。结果:
["Hi", " How are you", " I'm fine", " It is 6 p", "m", " Thank you", " That's it"]
我也试过了
let array = s.componentsSeparatedByString(". ")
分离". "
效果很好,但如果我还想分离"? "
、"! "
,它会变得很乱。
我有什么办法可以做到吗?谢谢!
rmaddy 的回答是正确的 (+1)。 Swift 3 实现是:
var sentences = [String]()
string.enumerateSubstrings(in: string.startIndex ..< string.endIndex, options: .bySentences) { substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
您也可以使用正则表达式 NSRegularExpression
,尽管它比 rmaddy 的 .bySentences
解决方案复杂得多。在 Swift 3:
var sentences = [String]()
let regex = try! NSRegularExpression(pattern: "(^|\s+)(\w.*?[.!?]+)(?=(\s+|$))")
regex.enumerateMatches(in: string, range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substring(with: match!.rangeAt(2)))
}
或Swift 2:
let regex = try! NSRegularExpression(pattern: "(^|\s+)(\w.*?[.!?]+)(?=(\s+|$))", options: [])
var sentences = [String]()
regex.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substringWithRange(match!.rangeAtIndex(2)))
}
[.!?]
语法匹配这三个字符中的任何一个。 |
表示 "or"。 ^
匹配字符串的开头。 $
匹配字符串的结尾。 \s
匹配空白字符。 \w
匹配 "word" 字符。 *
匹配零个或多个前面的字符。 +
匹配前面的一个或多个字符。 (?=)
是一个前瞻断言(例如,看看那里是否有东西,但不要通过那场比赛前进)。
我试着简化了一点,但它仍然很复杂。正则表达式提供丰富的文本模式匹配,但是,诚然,当您第一次使用它时它有点密集。但此翻译也匹配 (a) 重复的标点符号(例如 "Thank you!!!"
)、(b) 前导空格和 (c) 尾随空格。
我也试图找到一个正则表达式来解决这个问题:(([^.!?]+\s)*\S+(\.|!|\?))
这里的解释来自regexper and an example
好吧,我也从 here
中找到了一个正则表达式
var pattern = "(?<=[.?!;…])\s+(?=[\p{Lu}\p{N}])"
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
let sReplaced = s.stringByReplacingOccurrencesOfString(pattern, withString:"[*-SENTENCE-*]" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
let array = sReplaced.componentsSeparatedByString("[*-SENTENCE-*]")
也许这不是一个好方法,因为它必须先替换然后再分隔字符串。 :)
更新:
对于regex部分,如果你还想匹配Chinese/Japanese个标点符号(每个标点符号后面的space不是必需的),你可以使用以下一个:
((?<=[.?!;…])\s+|(?<=[。!?;…])\s*)(?=[\p{L}\p{N}])
提供了一种可让您枚举字符串的方法。您可以通过单词或句子或其他选项来做到这一点。不需要正则表达式。
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
var sentences = [String]()
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex, options: .BySentences) {
substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
print(sentences)
结果是:
["Hi! ", "How are you? ", "I\'m fine. ", "It is 6 p.m. ", "Thank you! ", "That\'s it."]
如果分割基础比句子更深奥一点,这个扩展就可以了。
extension String {
public func components(separatedBy separators: [String]) -> [String] {
var output: [String] = [self]
for separator in separators {
output = output.flatMap { [=10=].components(separatedBy: separator) }
}
return output.map { [=10=].trimmingCharacters(in: .whitespaces)}
}
}
let artists = "Rihanna, featuring Calvin Harris".components(separated by: [", with", ", featuring"])
所以这里是字符串 s
:
"Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
我希望将它们分隔成一个数组:
["Hi", "How are you", "I'm fine", "It is 6 p.m", "Thank you", "That's it"]
这意味着分隔符应该是 ". "
+ "? "
+ "! "
我试过:
let charSet = NSCharacterSet(charactersInString: ".?!")
let array = s.componentsSeparatedByCharactersInSet(charSet)
但它也会将 p.m.
分成两个元素。结果:
["Hi", " How are you", " I'm fine", " It is 6 p", "m", " Thank you", " That's it"]
我也试过了
let array = s.componentsSeparatedByString(". ")
分离". "
效果很好,但如果我还想分离"? "
、"! "
,它会变得很乱。
我有什么办法可以做到吗?谢谢!
rmaddy 的回答是正确的 (+1)。 Swift 3 实现是:
var sentences = [String]()
string.enumerateSubstrings(in: string.startIndex ..< string.endIndex, options: .bySentences) { substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
您也可以使用正则表达式 NSRegularExpression
,尽管它比 rmaddy 的 .bySentences
解决方案复杂得多。在 Swift 3:
var sentences = [String]()
let regex = try! NSRegularExpression(pattern: "(^|\s+)(\w.*?[.!?]+)(?=(\s+|$))")
regex.enumerateMatches(in: string, range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substring(with: match!.rangeAt(2)))
}
或Swift 2:
let regex = try! NSRegularExpression(pattern: "(^|\s+)(\w.*?[.!?]+)(?=(\s+|$))", options: [])
var sentences = [String]()
regex.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substringWithRange(match!.rangeAtIndex(2)))
}
[.!?]
语法匹配这三个字符中的任何一个。 |
表示 "or"。 ^
匹配字符串的开头。 $
匹配字符串的结尾。 \s
匹配空白字符。 \w
匹配 "word" 字符。 *
匹配零个或多个前面的字符。 +
匹配前面的一个或多个字符。 (?=)
是一个前瞻断言(例如,看看那里是否有东西,但不要通过那场比赛前进)。
我试着简化了一点,但它仍然很复杂。正则表达式提供丰富的文本模式匹配,但是,诚然,当您第一次使用它时它有点密集。但此翻译也匹配 (a) 重复的标点符号(例如 "Thank you!!!"
)、(b) 前导空格和 (c) 尾随空格。
我也试图找到一个正则表达式来解决这个问题:(([^.!?]+\s)*\S+(\.|!|\?))
这里的解释来自regexper and an example
好吧,我也从 here
中找到了一个正则表达式var pattern = "(?<=[.?!;…])\s+(?=[\p{Lu}\p{N}])"
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
let sReplaced = s.stringByReplacingOccurrencesOfString(pattern, withString:"[*-SENTENCE-*]" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
let array = sReplaced.componentsSeparatedByString("[*-SENTENCE-*]")
也许这不是一个好方法,因为它必须先替换然后再分隔字符串。 :)
更新:
对于regex部分,如果你还想匹配Chinese/Japanese个标点符号(每个标点符号后面的space不是必需的),你可以使用以下一个:
((?<=[.?!;…])\s+|(?<=[。!?;…])\s*)(?=[\p{L}\p{N}])
提供了一种可让您枚举字符串的方法。您可以通过单词或句子或其他选项来做到这一点。不需要正则表达式。
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
var sentences = [String]()
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex, options: .BySentences) {
substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
print(sentences)
结果是:
["Hi! ", "How are you? ", "I\'m fine. ", "It is 6 p.m. ", "Thank you! ", "That\'s it."]
如果分割基础比句子更深奥一点,这个扩展就可以了。
extension String {
public func components(separatedBy separators: [String]) -> [String] {
var output: [String] = [self]
for separator in separators {
output = output.flatMap { [=10=].components(separatedBy: separator) }
}
return output.map { [=10=].trimmingCharacters(in: .whitespaces)}
}
}
let artists = "Rihanna, featuring Calvin Harris".components(separated by: [", with", ", featuring"])