为什么这个有效的正则表达式不适用于 sed?

Why this working regex does not work with sed?

我有这种类型的文字:

Song of Solomon 1:1: The song of songs, which is Solomon’s.
John 3:16:For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
III John 1:8: We therefore ought to receive such, that we might be fellowhelpers to the truth.

我正在尝试删除这节经文(如果您愿意,也可以删除元数据),只获取纯文本内容。示例文本显示了三种不同类型的经文(多词、单词和罗马 + 词),我认为从每一行的开头检测 会更容易,直到 "number:number:",然后将其替换为“”(空字符串)。

我测试了一个似乎有效的正则表达式(正如我所描述的):

  1. 首先查找直到 "number:number:" 排除它 [或: .+?(?=(\s+)(\d+)(:)(\d+)(:))],
  2. 然后包含 "number:number:" 模式 [或: (\s+)(\d+)(:)(\d+)(:)]

这导致以下正则表达式:

.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)

正则表达式似乎工作正常,你可以试试here,问题是当我尝试将正则表达式与 sed 一起使用时它不起作用:

$ sed 's/.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)//g' testcase.txt

它将生成与输入相同的文本,它应该生成的时间:

 The song of songs, which is Solomon’s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
 We therefore ought to receive such, that we might be fellowhelpers to the truth.

有什么帮助吗?

非常感谢!

您可以使用以下 sed 命令:

sed  's/.*[0-9]\+:[0-9]\+: *//' file.txt

如果您只有基本的 posix 正则表达式可用,则需要使用以下命令:

sed 's/.*[0-9]\{1,\}:[0-9]\{1,\}: \{0,\}//' file.txt

我需要使用 \{1,\},因为 \+\* 运算符不是基本 posix 正则表达式规范的一部分。


顺便说一句,如果你有 GNU 好东西,你也可以使用 grep:

grep -oP  '.*([0-9]+:){2} *\K.*' file.txt

我在这里使用 \K 选项。 \K 清除当前匹配直到这一点,它可以像 lookbehind 断言一样使用 - 但具有可变长度。

这个awk应该做的:

awk -F": *" '{print }' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.

为了使 number:number: 更安全,请使用:

awk -F"[0-9]+:[0-9]+: *" '{print }' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.

这也可以防止文本中 : 出现问题。

使用 Adams 正则表达式,我们可以将其缩短一些。

awk -F"([0-9]+:){2} ?" '{print }' file

awk -F"([0-9]+:){2} ?" '{[=13=]=}1' file

这个:

sed  -r 's/.*([0-9]+:){2} ?//' testcase.txt

这是 cut 发明的目的:

$ cut -d: -f3- file
 The song of songs, which is Solomon’s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
 We therefore ought to receive such, that we might be fellowhelpers to the truth.