如果找到选定的短语，则从 html 字符串中排除 table 单元

Question

我会尝试用一个例子来解释这个问题。

假设我有一个很大的 html 字符串，其中包括以下类型的 table 单位。

<table id="table-1">
    <tbody>
        <tr><td><p>{{Phrase 1}}</p></td></tr>
    </tbody>
</table>
<table id="table-2">
    <tbody>
        <tr><td><p>Sample text 1 goes here..</p></td></tr>
    </tbody>
</table>
<table id="table-3">
    <tbody>
        <tr><td><p>{{Phrase 2}}</p></td></tr>
    </tbody>
</table>
<table id="table-4">
    <tbody>
        <tr><td><p>Sample text 2 goes here..</p></td></tr>
    </tbody>
</table>

我需要 PHP 函数从 html 字符串中排除包含 {{Phrase 1}} 或 {{Phrase 2}}.

的完整 table

只是在上面的示例中，我需要排除 table-1 & table-3 结果字符串如下所示，

<table id="table-2">
    <tbody>
        <tr><td><p>Sample text 1 goes here..</p></td></tr>
    </tbody>
</table>
<table id="table-4">
    <tbody>
        <tr><td><p>Sample text 2 goes here..</p></td></tr>
    </tbody>
</table>

我尝试了 preg_replace 功能，但它没有用，因为我只能替换选定的文本而不是整个单元。

谁能帮我解决这个问题。

我目前已有的示例代码，仍在尝试开发它。

$patterns = array();
$patterns[0] = '{{Phrase 1}}';
$patterns[1] = '{{Phrase 2}}';

$replacements = array();
$replacements[0] = '';
$replacements[1] = '';

$string = '<table id="table-1">
    <tbody>
        <tr><td><p>{{Phrase 1}}</p></td></tr>
    </tbody>
</table>
<table id="table-2">
    <tbody>
        <tr><td><p>Sample text 1 goes here..</p></td></tr>
    </tbody>
</table>
<table id="table-3">
    <tbody>
        <tr><td><p>{{Phrase 2}}</p></td></tr>
    </tbody>
</table>
<table id="table-4">
    <tbody>
        <tr><td><p>Sample text 2 goes here..</p></td></tr>
    </tbody>
</table>';

echo '<pre>';
echo htmlspecialchars(preg_replace($patterns, $replacements, $string));
echo '</pre>';

Answer 1

无需使用 DOM 或（上帝保佑）正则表达式的一种非常简单的方法是剥离标签并在三个新行上展开。
条形标签将删除所有 HTML 并在其位置留下空格。

$html = '<table id="table-1">
<tbody>
    <tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
    <tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
    <tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
    <tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';

$arr = explode(PHP_EOL.PHP_EOL.PHP_EOL , strip_tags($html));

// Optional output. But the trim is needed so some 
// kind of loop is needed to remove the extra spaces
For($i=1; $i<count($arr);){
     Echo trim($arr[$i]) . "<Br>\n";
     $i = $i+2;
}

https://3v4l.org/gPQZn

Answer 2

如果结构始终相同，那么您可以使用简单的正则表达式来实现：

// This regex matches the current structure, no matter what the number for the table id is 
// and either Phrase 1 or 2.
$regex = '/(<table id="table-[0-9]+">[\s]+<tbody>[\s]+<tr><td><p>\{\{Phrase (1|2)\}\}<\/p><\/td><\/tr>[\s]+<\/tbody>[\s]+<\/table>)/';

$html = '<table id="table-1">
    <tbody>
        <tr><td><p>{{Phrase 1}}</p></td></tr>
    </tbody>
</table>
<table id="table-2">
    <tbody>
        <tr><td><p>Sample text 1 goes here..</p></td></tr>
    </tbody>
</table>
<table id="table-3">
    <tbody>
        <tr><td><p>{{Phrase 2}}</p></td></tr>
    </tbody>
</table>
<table id="table-4">
    <tbody>
        <tr><td><p>Sample text 2 goes here..</p></td></tr>
    </tbody>
</table>';

// Simply perform a replace with an empty string
$clean = preg_replace($regex, '', $html);

演示：https://3v4l.org/4QHvm

如果你想要更详细的正则表达式解释，你可以在这里阅读更多：https://regex101.com/r/B128DE/1

如果找到选定的短语，则从 html 字符串中排除 table 单元

Exclude table unit from the html string if there is selected phrase found

php

replace

preg-replace