在结束 HTML 标签前添加缺失的标点符号
Adding missing punctuations before a closing HTML tag
我的字符串是一个 HTML 文件。当前面没有标点符号时,我想在 HTML 结束标记之前添加一个点。标点符号是 .,?!:
,我想为此使用 preg_replace
。
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
我的函数:
$source = 'the above html';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
$tags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' ];
foreach ($tags as $tag) {
$input = preg_replace(
"/[^,.;!?](<\/".$tag.">)/mi",
".[=12=]",
$input
);
}
return $input;
}
我尝试了 .[=16=]
、.[=17=]
、.
、.
、.\0
、.\1
,但没有任何效果。充其量,它吞下了火柴,但没有用任何东西代替它。我的模式的匹配部分似乎适用于 regex101 和其他网站。
想要的结果是:
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong>.</h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
你不需要像那样遍历 $tags
,我要么用 |
做一个 implode
,要么在这种情况下只对所有规则正确可能的元素。
$source = '<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
return preg_replace("/[^,.;!?]\K<\/h[1-6]>/mi", ".[=10=]", $input);
}
您还需要忽略元素之前的任何字符,\K
会做到这一点。 ${}
是一个PHP变量,[=18=]
是捕获组,以后写成[=19=]
可能会更清楚。
正则表达式演示:https://regex101.com/r/xUvvuf/1/
(示例使用 [=19=]
。https://3v4l.org/jGZal)
您可以采用的另一种方法是跳过所有带有标点符号的元素,这样可以减少一些步骤。
https://regex101.com/r/xUvvuf/2/
[,.;!?]<\/h[1-6]>(*SKIP)(*FAIL)|<\/h[1-6]>
您也可以更改 delimiter;不过,这是更多的个人喜好。如果你不介意转义 /
s 你可以继续这样做,如果不只是交换前导和结束 /
与 ~
.
演示:https://regex101.com/r/xUvvuf/3/
preg_replace("~[^,.;!?]\K</h[1-6]>~mi"
我的字符串是一个 HTML 文件。当前面没有标点符号时,我想在 HTML 结束标记之前添加一个点。标点符号是 .,?!:
,我想为此使用 preg_replace
。
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
我的函数:
$source = 'the above html';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
$tags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' ];
foreach ($tags as $tag) {
$input = preg_replace(
"/[^,.;!?](<\/".$tag.">)/mi",
".[=12=]",
$input
);
}
return $input;
}
我尝试了 .[=16=]
、.[=17=]
、.
、.
、.\0
、.\1
,但没有任何效果。充其量,它吞下了火柴,但没有用任何东西代替它。我的模式的匹配部分似乎适用于 regex101 和其他网站。
想要的结果是:
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong>.</h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
你不需要像那样遍历 $tags
,我要么用 |
做一个 implode
,要么在这种情况下只对所有规则正确可能的元素。
$source = '<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
return preg_replace("/[^,.;!?]\K<\/h[1-6]>/mi", ".[=10=]", $input);
}
您还需要忽略元素之前的任何字符,\K
会做到这一点。 ${}
是一个PHP变量,[=18=]
是捕获组,以后写成[=19=]
可能会更清楚。
正则表达式演示:https://regex101.com/r/xUvvuf/1/
(示例使用 [=19=]
。https://3v4l.org/jGZal)
您可以采用的另一种方法是跳过所有带有标点符号的元素,这样可以减少一些步骤。
https://regex101.com/r/xUvvuf/2/
[,.;!?]<\/h[1-6]>(*SKIP)(*FAIL)|<\/h[1-6]>
您也可以更改 delimiter;不过,这是更多的个人喜好。如果你不介意转义 /
s 你可以继续这样做,如果不只是交换前导和结束 /
与 ~
.
演示:https://regex101.com/r/xUvvuf/3/
preg_replace("~[^,.;!?]\K</h[1-6]>~mi"