使用具有不同行尾样式的 m 标志匹配行尾位置
Matching end of line position using m flag with different line ending styles
我试图用标签将以“##”开头的每一行换行。尝试实现文本格式的 GitHub/Whosebug-like 语法。
这是我得到的:
$value = preg_replace('/^## (.*)$/m', '<p></p>', $value);
在谷歌上搜索了一段时间后,这似乎是正确的解决方案,但它没有按预期工作,或者我只是不明白。
示例文本:
## Some header 1
Some text that doesn't need to be altered
## Some header 2
这是结果:
<p>Some header 1
</p>
Some text that doesn't need to be altered
<p>Some header 2</p>
如您所见,第二个 header 处理得很好,因为它位于文本末尾。但是,第一个 header 在结束标记之前多了一个新行。我该如何摆脱它?
似乎在您当前的 PCRE 设置中,点匹配除 LF(\n
、换行符)以外的所有字符,因此它匹配 CR(\r
、回车 return), 这也是一个换行符。
PCRE 支持覆盖默认换行符(因此支持 $
锚点的行为)。要使 .
匹配除 CR 和 LF 之外的所有字符,请打开相应的标志:
'/(*ANYCRLF)^## (.*)$/m'
^^^^^^^^^^
$
将在 \r\n
.
之前断言行尾
在 rexegg.com 查看更多关于这个和其他动词的信息:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a .
(as the dot it doesn't match line breaks unless in dotall mode), as well the ^
and $
anchors' behavior in multiline mode. You can override this default with the following modifiers:
✽ (*CR)
Only a carriage return is considered to be a line break
✽ (*LF)
Only a line feed is considered to be a line break (as on Unix)
✽ (*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽ (*ANYCRLF)
Any of the above three is considered to be a line break
✽ (*ANY)
Any Unicode newline sequence is considered to be a line break
For instance, (*CR)\w+.\w+
matches Line1\nLine2
because the dot is able to match the \n
, which is not considered to be a line break. See demo.
我试图用标签将以“##”开头的每一行换行。尝试实现文本格式的 GitHub/Whosebug-like 语法。
这是我得到的:
$value = preg_replace('/^## (.*)$/m', '<p></p>', $value);
在谷歌上搜索了一段时间后,这似乎是正确的解决方案,但它没有按预期工作,或者我只是不明白。
示例文本:
## Some header 1
Some text that doesn't need to be altered
## Some header 2
这是结果:
<p>Some header 1
</p>
Some text that doesn't need to be altered
<p>Some header 2</p>
如您所见,第二个 header 处理得很好,因为它位于文本末尾。但是,第一个 header 在结束标记之前多了一个新行。我该如何摆脱它?
似乎在您当前的 PCRE 设置中,点匹配除 LF(\n
、换行符)以外的所有字符,因此它匹配 CR(\r
、回车 return), 这也是一个换行符。
PCRE 支持覆盖默认换行符(因此支持 $
锚点的行为)。要使 .
匹配除 CR 和 LF 之外的所有字符,请打开相应的标志:
'/(*ANYCRLF)^## (.*)$/m'
^^^^^^^^^^
$
将在 \r\n
.
在 rexegg.com 查看更多关于这个和其他动词的信息:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a
.
(as the dot it doesn't match line breaks unless in dotall mode), as well the^
and$
anchors' behavior in multiline mode. You can override this default with the following modifiers:✽
(*CR)
Only a carriage return is considered to be a line break
✽(*LF)
Only a line feed is considered to be a line break (as on Unix)
✽(*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽(*ANYCRLF)
Any of the above three is considered to be a line break
✽(*ANY)
Any Unicode newline sequence is considered to be a line breakFor instance,
(*CR)\w+.\w+
matchesLine1\nLine2
because the dot is able to match the\n
, which is not considered to be a line break. See demo.