使用具有不同行尾样式的 m 标志匹配行尾位置

Matching end of line position using m flag with different line ending styles

我试图用标签将以“##”开头的每一行换行。尝试实现文本格式的 GitHub/Whosebug-like 语法。

这是我得到的:

$value = preg_replace('/^## (.*)$/m', '<p></p>', $value);

在谷歌上搜索了一段时间后,这似乎是正确的解决方案,但它没有按预期工作,或者我只是不明白。

示例文本:

## Some header 1

Some text that doesn't need to be altered

## Some header 2

这是结果:

<p>Some header 1
</p>

Some text that doesn't need to be altered

<p>Some header 2</p>

如您所见,第二个 header 处理得很好,因为它位于文本末尾。但是,第一个 header 在结束标记之前多了一个新行。我该如何摆脱它?

似乎在您当前的 PCRE 设置中,点匹配除 LF(\n、换行符)以外的所有字符,因此它匹配 CR(\r、回车 return), 这也是一个换行符。

PCRE 支持覆盖默认换行符(因此支持 $ 锚点的行为)。要使 . 匹配除 CR 和 LF 之外的所有字符,请打开相应的标志:

'/(*ANYCRLF)^## (.*)$/m'
  ^^^^^^^^^^

$ 将在 \r\n.

之前断言行尾

rexegg.com 查看更多关于这个和其他动词的信息:

By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a . (as the dot it doesn't match line breaks unless in dotall mode), as well the ^ and $ anchors' behavior in multiline mode. You can override this default with the following modifiers:

(*CR) Only a carriage return is considered to be a line break
(*LF) Only a line feed is considered to be a line break (as on Unix)
(*CRLF) Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
(*ANYCRLF) Any of the above three is considered to be a line break
(*ANY) Any Unicode newline sequence is considered to be a line break

For instance, (*CR)\w+.\w+ matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See demo.