如何使用gsub简化正则表达式

Question

当 \href 命令中出现时，我想用 \ 转义 #。

通常我会写一个正则表达式，例如 s/(\href\{.*?)#(.*?)\}/\#/g，但我想 gsub 在这里我们会是一个不错的选择，首先提取 \href 内容然后替换 # 与 \#.

Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.

一行中可以有多个链接。

问题

可以gsub简化这类问题吗？

Answer 1

您可以使用两个 gsub：一个带有一个参数和一个块（用于 href{...}），一个带有 2 个参数（将 # 替换为 \#）：

text = %q(Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.)
puts text.gsub(/href{[^}]+}/){ |href| href.gsub('#', '\#') }
#=> Here is some text with a \href{./file.pdf\#section.1.5}{link} to section 1.5.

如果你想从 ruby -e 的终端为 test.txt 文件启动它，你可以使用：

ruby -pe '$_.gsub(/href{[^}]+}/){ |href| href.gsub(%q|#|, %q|\#|) }' test.txt
# Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.
# Here is some text with a \href{./file.pdf#section.1.6}{link} to section 1.6.
# Here is some text with a \href{./file.pdf#section.1.7}{link} to section 1.7.

或

ruby -e 'puts ARGF.read.gsub(/href{[^}]+}/){ |href| href.gsub(%q|#|, %q|\#|) }' test.txt
# Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.
# Here is some text with a \href{./file.pdf#section.1.6}{link} to section 1.6.
# Here is some text with a \href{./file.pdf#section.1.7}{link} to section 1.7.

不要混合使用 ruby -pe 和 ARGF.read，它只会读取文件的第一行！

Answer 2

除非包含在 \href{..} 中的一个或多个 url 的密码部分包含在引号之间，如 http://username:"sdkfj#lkn#"@domainname.org/path/file.ext，字符 [=13] 的唯一可能位置=] in a url 在末尾并分隔片段部分：./path/path/file.rb?val=toto#thefragmentpart.

换句话说，如果我没记错的话，每个 href{...} 最多可以逃脱一个 #。那么你可以简单地这样做：

text.gsub(/\href{[^#}]*\K#/, "\#")

字符 class [^#}] 禁止字符 } 并确保您始终在 curly 括号之间。

如何使用gsub简化正则表达式

How to use gsub to simplify regular expressions

ruby

regex

gsub