如何用 MatchData 对象替换 Perl 风格的正则表达式
How to replace Perl-style regex with MatchData object
我正在使用带有正则表达式的 gsub
方法:
@text.gsub(/(-\n)(\S+)\s/) { "#{}\n" }
输入数据示例:
"The wolverine is now es-
sentially absent from
the southern end
of its European range."
应该return:
"The wolverine is now essentially
absent from
the southern end
of its European range."
该方法工作正常,但 rubocop 报告和冒犯:
Avoid the use of Perl-style backrefs.
关于如何使用 MatchData
对象而不是
重写它有什么想法吗?
您可以在没有块的情况下使用反斜杠:
@text.gsub /(-\n)(\S+)\s/, "\2\n"
另外,只使用一组会更简洁一些,因为不需要上面的第一个:
@text.gsub /-\n(\S+)\s/, "\1\n"
如果你想使用 Regexp.last_match
:
@text.gsub(/(-\n)(\S+)\s/) { Regexp.last_match[2] + "\n" }
或:
@text.gsub(/-\n(\S+)\s/) { Regexp.last_match[1] + "\n" }
注意gsub
中的block在涉及逻辑的时候要用到。如果没有逻辑,将第二个参数设置为 "\1\n"
或 '' + "\n"
就可以了。
此解决方案解决了换行符和拆分句子或字符串结尾的单词之前的错误空格。它使用 String#gsub 和一个块,没有捕获组。
代码
R = /
[[:alpha:]]\- # match a letter followed by a hyphen
\s*\n # match a newline possibly preceded by whitespace
[[:alpha:]]+ # match one or more letters
[.?!]? # possibly match a sentence terminator
\n? # possibly match a newline
\s* # match zero or more whitespaces
/x # free-spacing regex definition mode
def remove_hyphens(str)
str.gsub(R) { |s| s.gsub(/[\n\s-]/, '') << "\n" }
end
例子
str =<<_
The wolverine is now es-
sentially absent from
the south-
ern end of its
European range.
_
puts remove_hyphens(str)
The wolverine is now essentially
absent from
the southern
end of its
European range.
puts remove_hyphens("now es- \nsentially\nabsent")
now essentially
absent
puts remove_hyphens("now es-\nsentially.\nabsent")
now essentially.
absent
remove_hyphens("now es-\nsentially?\n")
#=> "now essentially?\n" (no extra \n at end)
我正在使用带有正则表达式的 gsub
方法:
@text.gsub(/(-\n)(\S+)\s/) { "#{}\n" }
输入数据示例:
"The wolverine is now es-
sentially absent from
the southern end
of its European range."
应该return:
"The wolverine is now essentially
absent from
the southern end
of its European range."
该方法工作正常,但 rubocop 报告和冒犯:
Avoid the use of Perl-style backrefs.
关于如何使用 MatchData
对象而不是 重写它有什么想法吗?
您可以在没有块的情况下使用反斜杠:
@text.gsub /(-\n)(\S+)\s/, "\2\n"
另外,只使用一组会更简洁一些,因为不需要上面的第一个:
@text.gsub /-\n(\S+)\s/, "\1\n"
如果你想使用 Regexp.last_match
:
@text.gsub(/(-\n)(\S+)\s/) { Regexp.last_match[2] + "\n" }
或:
@text.gsub(/-\n(\S+)\s/) { Regexp.last_match[1] + "\n" }
注意gsub
中的block在涉及逻辑的时候要用到。如果没有逻辑,将第二个参数设置为 "\1\n"
或 '' + "\n"
就可以了。
此解决方案解决了换行符和拆分句子或字符串结尾的单词之前的错误空格。它使用 String#gsub 和一个块,没有捕获组。
代码
R = /
[[:alpha:]]\- # match a letter followed by a hyphen
\s*\n # match a newline possibly preceded by whitespace
[[:alpha:]]+ # match one or more letters
[.?!]? # possibly match a sentence terminator
\n? # possibly match a newline
\s* # match zero or more whitespaces
/x # free-spacing regex definition mode
def remove_hyphens(str)
str.gsub(R) { |s| s.gsub(/[\n\s-]/, '') << "\n" }
end
例子
str =<<_
The wolverine is now es-
sentially absent from
the south-
ern end of its
European range.
_
puts remove_hyphens(str)
The wolverine is now essentially
absent from
the southern
end of its
European range.
puts remove_hyphens("now es- \nsentially\nabsent")
now essentially
absent
puts remove_hyphens("now es-\nsentially.\nabsent")
now essentially.
absent
remove_hyphens("now es-\nsentially?\n")
#=> "now essentially?\n" (no extra \n at end)