从 html 中选择的正则表达式
regex selecting from html
我有这种文本,我想从中提取以下文本
Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
这是代码块
我使用的正则表达式是 /Name and address of the Employer(.*)<p>/
但这是直到最后一个选择 <p>
<p><b>Certificate under Section 203 of the Income-tax Act, 1961 for tax deducted at source on salary
</b></p>
<p><b>Name and address of the Employer
</b></p>
<p>Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
</p>
<p><b>Name and address of the Employee
</b></p>
<p>EMPLOYEE NAME
EMPLOYEE ADDRESS HERE
</p>
<p><b>PAN of the Deductor
</b></p>
<p>ACHFS9000A
</p>
<p><b>TAN of the Deductor
</b></p>
<p>DELS50000E
</p>
您可以使用 DOMDocument 和 DOMXPath 提取 p
标签的内容,该标签是 p
节点的下一个兄弟节点,该节点具有 b
个子节点,其内容包含 Name and address of the Employer
此查询:
$xp->query("//p[contains(./b, 'Name and address of the Employer')]");
参见PHP示例代码:
<?php
$html = <<<HTML
<p><b>Certificate under Section 203 of the Income-tax Act, 1961 for tax deducted at source on salary
</b></p>
<p><b>Name and address of the Employer
</b></p>
<p>Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
</p>
<p><b>Name and address of the Employee
</b></p>
<p>EMPLOYEE NAME
EMPLOYEE ADDRESS HERE
</p>
<p><b>PAN of the Deductor
</b></p>
<p>ACHFS9000A
</p>
<p><b>TAN of the Deductor
</b></p>
<p>DELS50000E
</p>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query("//p[contains(./b, 'Name and address of the Employer')]");
foreach ($links as $link) {
echo $link->nextSibling->nodeValue;
}
我有这种文本,我想从中提取以下文本
Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
这是代码块
我使用的正则表达式是 /Name and address of the Employer(.*)<p>/
但这是直到最后一个选择 <p>
<p><b>Certificate under Section 203 of the Income-tax Act, 1961 for tax deducted at source on salary
</b></p>
<p><b>Name and address of the Employer
</b></p>
<p>Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
</p>
<p><b>Name and address of the Employee
</b></p>
<p>EMPLOYEE NAME
EMPLOYEE ADDRESS HERE
</p>
<p><b>PAN of the Deductor
</b></p>
<p>ACHFS9000A
</p>
<p><b>TAN of the Deductor
</b></p>
<p>DELS50000E
</p>
您可以使用 DOMDocument 和 DOMXPath 提取 p
标签的内容,该标签是 p
节点的下一个兄弟节点,该节点具有 b
个子节点,其内容包含 Name and address of the Employer
此查询:
$xp->query("//p[contains(./b, 'Name and address of the Employer')]");
参见PHP示例代码:
<?php
$html = <<<HTML
<p><b>Certificate under Section 203 of the Income-tax Act, 1961 for tax deducted at source on salary
</b></p>
<p><b>Name and address of the Employer
</b></p>
<p>Company Name ASSOCIATES LLP
18-20, FLOOR,, BUILDING,
K MARG, NEW - 110001
Delhi
+(91)124-0000000
email@EMAIL.COM
</p>
<p><b>Name and address of the Employee
</b></p>
<p>EMPLOYEE NAME
EMPLOYEE ADDRESS HERE
</p>
<p><b>PAN of the Deductor
</b></p>
<p>ACHFS9000A
</p>
<p><b>TAN of the Deductor
</b></p>
<p>DELS50000E
</p>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query("//p[contains(./b, 'Name and address of the Employer')]");
foreach ($links as $link) {
echo $link->nextSibling->nodeValue;
}