Sed 正则表达式最多捕获模式字符串之前和字符串之后的一个词

Question

以下只是模板示例。想要一个通用的正则表达式

在维基运行之前，此模板通过帮助开发数据库“查询”充当门户，它通过搜索 link 实现此目的，也可以是用于分享此类发现。此模板也可用于具有此版本 Cirrus Search 的 "learn",string "regular",string "expression",string 语法。

正则表达式 1:

捕获组直到模式字符串。

例如：模式=查询

使用 sed 捕获组 直到“查询”和“查询”之后，

[注意：查询后的逗号是可选的，可能存在也可能不存在，因此捕获组 1 最多应包含 query，捕获组 2 应包含 before。 "]

尝试过：

sed 's/^(.*?)"query"(.*)//g'

以上有效。但是捕获组 2 包含我不想要逗号的逗号，并且逗号是可选的，需要像 [] 一样使用。以上需要帮助。

正则表达式 2:

在模式前最多捕获 个词的组

例如：图案：“常规”

所以，捕获组 1 应包含 到“学习” 的所有文本，捕获组 2 应包含所有内容，包括 字符串“表达式”

之后

[表示：不捕获 ,string "regular",]

尝试过：

sed 's/^(.*?)"\w"[^\"]*"regular"([^"]*)(.*)//g' -rE

但是没有用。我已经将“\w”用于广义正则表达式，这就是我想要的。

正则表达式 3：查询有关 sed 中的捕获组

有没有办法搜索捕获组或编辑捕获组本身

例如：sed -r '/(someword)(.*)/ s/\1/something/g' 类似的东西或其他可能的东西

Answer 1

Is there a way to search for captured group or edit the capture group itself

您可以将模式 space 分成块，按住它以保持 space，只提取感兴趣的部分，对其进行编辑，然后将 space 收回并随机播放模式 space 回到原来的行。

s/\(someword\)\(.*\)/\n\n/   # split pattern space into chunks
h                                # hold it to hold space
s/.*\n//                         # extract only interested part
s/.*/something/                  # do edit on it
G                                # take hold space back
s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\).*//    # shuffle

上面我使用换行符作为块分隔符。请注意，在 s 命令的 replacement 部分使用 \n 通常是对 POSIX 的扩展，但我认为无处不在。

Wanted a generalised regex

使用真正的编程语言，tokenize 输入并处理标记然后输出它们。 Python、Perl、AWK等着你

Captured group has "word,word2" and i want to remove ,

以下脚本：

#!/bin/bash
sed '
  s/\(word,word\)\(.*\)/\n\n/
  h
  s/.*\n//
  s/,//
  G
  s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\).*//
' <<<'stuff1, word,word ,stuff2'

输出：

stuff1, wordword ,stuff2

Sed 正则表达式最多捕获模式字符串之前和字符串之后的一个词

Sed regex to capture upto one word before the pattern string and after the string

sed