使用具有不同行尾样式的 m 标志匹配行尾位置

Question

我试图用标签将以“##”开头的每一行换行。尝试实现文本格式的 GitHub/Whosebug-like 语法。

这是我得到的：

$value = preg_replace('/^## (.*)$/m', '<p></p>', $value);

在谷歌上搜索了一段时间后，这似乎是正确的解决方案，但它没有按预期工作，或者我只是不明白。

示例文本：

## Some header 1

Some text that doesn't need to be altered

## Some header 2

这是结果：

<p>Some header 1
</p>

Some text that doesn't need to be altered

<p>Some header 2</p>

如您所见，第二个 header 处理得很好，因为它位于文本末尾。但是，第一个 header 在结束标记之前多了一个新行。我该如何摆脱它？

Answer 1

似乎在您当前的 PCRE 设置中，点匹配除 LF（\n、换行符）以外的所有字符，因此它匹配 CR（\r、回车 return), 这也是一个换行符。

PCRE 支持覆盖默认换行符（因此支持 $ 锚点的行为）。要使 . 匹配除 CR 和 LF 之外的所有字符，请打开相应的标志：

'/(*ANYCRLF)^## (.*)$/m'
  ^^^^^^^^^^

$ 将在 \r\n.

之前断言行尾

在 rexegg.com 查看更多关于这个和其他动词的信息：

By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a . (as the dot it doesn't match line breaks unless in dotall mode), as well the ^ and $ anchors' behavior in multiline mode. You can override this default with the following modifiers:

✽ (*CR) Only a carriage return is considered to be a line break
✽ (*LF) Only a line feed is considered to be a line break (as on Unix)
✽ (*CRLF) Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽ (*ANYCRLF) Any of the above three is considered to be a line break
✽ (*ANY) Any Unicode newline sequence is considered to be a line break

For instance, (*CR)\w+.\w+ matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See demo.

使用具有不同行尾样式的 m 标志匹配行尾位置

Matching end of line position using m flag with different line ending styles

php

regex

preg-replace