如何用 MatchData 对象替换 Perl 风格的正则表达式

Question

我正在使用带有正则表达式的 gsub 方法：

@text.gsub(/(-\n)(\S+)\s/) { "#{}\n" }

输入数据示例：

"The wolverine is now es-
sentially absent from 
the southern end
of its European range."

应该return:

"The wolverine is now essentially
absent from  
the southern end
of its European range."

该方法工作正常，但 rubocop 报告和冒犯：

Avoid the use of Perl-style backrefs.

关于如何使用 MatchData 对象而不是重写它有什么想法吗？

Answer 1

您可以在没有块的情况下使用反斜杠：

@text.gsub /(-\n)(\S+)\s/, "\2\n"

另外，只使用一组会更简洁一些，因为不需要上面的第一个：

@text.gsub /-\n(\S+)\s/, "\1\n"

Answer 2

如果你想使用 Regexp.last_match :

@text.gsub(/(-\n)(\S+)\s/) { Regexp.last_match[2] + "\n" }

或：

@text.gsub(/-\n(\S+)\s/) { Regexp.last_match[1] + "\n" }

注意gsub中的block在涉及逻辑的时候要用到。如果没有逻辑，将第二个参数设置为 "\1\n" 或 '' + "\n" 就可以了。

Answer 3

此解决方案解决了换行符和拆分句子或字符串结尾的单词之前的错误空格。它使用 String#gsub 和一个块，没有捕获组。

代码

R = /
    [[:alpha:]]\- # match a letter followed by a hyphen
    \s*\n         # match a newline possibly preceded by whitespace
    [[:alpha:]]+  # match one or more letters
    [.?!]?        # possibly match a sentence terminator
    \n?           # possibly match a newline 
    \s*           # match zero or more whitespaces
    /x            # free-spacing regex definition mode

def remove_hyphens(str)
  str.gsub(R) { |s| s.gsub(/[\n\s-]/, '') << "\n" }
end

例子

str =<<_       
The wolverine is now es-
sentially absent from
the south-
ern end of its
European range.
_

puts remove_hyphens(str)
The wolverine is now essentially
absent from
the southern
end of its
European range.

puts remove_hyphens("now es-  \nsentially\nabsent")
now essentially
absent

puts remove_hyphens("now es-\nsentially.\nabsent")
now essentially.
absent

remove_hyphens("now es-\nsentially?\n")
  #=> "now essentially?\n" (no extra \n at end)

如何用 MatchData 对象替换 Perl 风格的正则表达式

How to replace Perl-style regex with MatchData object

ruby

regex

rubocop