根据行的开始方式替换“\n”

Question

我正在尝试将文件中的行结尾 ('\n') 替换为多个字符 ("
\n")（因此不使用 tr），但仅限于某些行，取决于他们如何开始。

我想要什么：

文档位于：

# Title

paragraph,
with text over multiple lines

- list item
- other list item
 - sublist item
 - sublist item 2

输出：

# Title

paragraph,<br>
with text over multiple lines<br>

- list item
- other list item
 - sublist item
 - sublist item 2

你可能猜到了，当我稍后将我的 markdown 文档转换为 html.

时，我试图在单个换行符（段落中）强制换行

什么我tried/know

我已经查阅了正则表达式的语法和 'sed' 命令的基础知识，所以我的理解是我需要一些使用负回溯来不匹配特定开头的东西，然后可能是一个非捕获组或对该行内容的积极回顾，然后是 \n 上的实际匹配以及我想要替换的内容。

如果你考虑一个例子，我只排除以（1个或多个'#'后跟space）或（[可能space]然后[破折号]开头的行然后 [one space]), 我目前使用的是:

#DOESN'T WORK (and ~/Documents/test/foo contains exactly the example I put above)
sed -z 's/(?<!#+\s|\s*-\s)(?:[^\n]*)\n/<br>\n/g' ~/Documents/test/foo

我对命令的理解

也许我在 bash 上下文中对 sed/regexes 的理解是错误的，所以我将解释我是如何理解我写的内容的：

sed -z     # -z flag to treat the document as one big string, apparently good when
           # dealing with newlines replacement (https://linuxhint.com/newline_replace_sed/)
           # string with the command
'          #'
s/         # s(ubstitute) command in sed
(?<!#+\s|\s*-\s) # negative lookback ignoring '#+\s' (one or more '#' followed by a
                 # space) and '\s*-\s' (0 or more spaces, a dash then a space)
(?:[^\n]*) # non-capturing group matching the content of the line (literally anything
           # but newline, 0 or more times) because there is something between the
           # beginning I want to ignore and the ending I want to replace (hence group),
           # and I do not want this to be replaced (hence non-capturing).
\n         # the thing I'm matching on, and replacing
/          # separator to announce the replacement for the match
<br>\n     # replacement
/g         # g tag because I want to replace all matching occurences
'          #'
           # end of the command string

 ~/Documents/test/foo # my input/source

我尝试使用 grep -o 来显示匹配项（并尝试更正我的正则表达式），但 '!'负面回顾总是把事情搞砸，-F 标签似乎无法解决问题。

感谢任何帮助，只要您能提供一个能够在替换行尾时忽略
""# Title\n" 或 " - list item\n" 的示例，我'我会弄清楚如何扩展它。

PS

是的，我知道在段落的最后一行留下“
”看起来很糟糕，但我稍后会解决这个问题，在这个小例子中我们不要让正则表达式过长。
虽然我的示例确实是在命令行中完成的，但这是用于 bash 脚本（因此标记），因此答案应与 bash 兼容或提供解释为什么他们不是（我不是很熟悉这些差异，但我在这里读到一些标准没有共享）
我的环境是 Pop!_OS 21.10（但这不重要，对吗？）

提前致谢

Answer 1

您的正则表达式正在使用 PCRE 构造（例如回溯），但 sed 不支持 PCRE，默认情况下只支持 BRE 的 POSIX 正则表达式标准，如果在 GNU 或 BSD 中使用 -E 调用 ERE sed.

为了简单性和可移植性，我会使用 awk。以下将使用任何 POSIX awk:

$ awk '{print [=10=] (/^[[:space:]]*([-#]|$)/ ? "" : "<br>")}' file
# Title

paragraph,<br>
with text over multiple lines<br>

- list item
- other list item
 - sublist item
 - sublist item 2

原回答： $ sed 's/^[^-#].*/&
/' 文件 # 标题

paragraph,<br>
with text over multiple lines<br>

- list item
- other list item

如果这不是您所需要的全部，那么请编辑您的问题以提供一个更具代表性的示例，其中包括上述方法不起作用的情况。

根据行的开始方式替换“\n”

Replace "\n" depending on how the line started

sed

我想要什么：

什么我tried/know

我对命令的理解

PS