获取内含标识符的 html 个标签之间的内容
Get content between html tags with identifier inside
我有这些跨度标签:
<div>
<span style="background: url('/wp-content/themes/minimum-child/img/address.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url('/wp-content/themes/minimum-child/img/email.png') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url('/wp-content/themes/minimum-child/img/tel.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>
我需要获取跨度之间的内容,但我需要将内容分隔为单个变量 $address
、$email
、$phone
、$web
,等等。很明显,我可以使用背景图像的名称作为模式,因为图像的名称仍然相同(address.png、email.png 等)
目前我觉得还是需要用到preg_match_all
这个功能,我已经试过了,但是至今没有成功。
我试过(获取 $address
变量的地址):
$url="'/wp-content/themes/minimum-child/img/address.png'";
$tag='span style="background: url('.$url.')';
$matches=array();
$pattern = "/<$tag ?.*>(.*)<\/span>/";
preg_match($pattern, $htmlcontent, $matches);
$address=$matches[1];
不幸的是,它不起作用。你知道如何实现它吗?
人们常说用正则表达式解析 html 充满问题 - 所以我会选择更简单的方法,即使用 DOMDocument
来帮助处理 html片段 - 如果需要,您可以使用正则表达式进一步优化一些结果。
$html='
<div>
<span style="background: url(\'/wp-content/themes/minimum-child/img/address.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url(\'/wp-content/themes/minimum-child/img/email.png\') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url(\'/wp-content/themes/minimum-child/img/tel.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>';
$dom=new DOMDocument;
$dom->loadHTML( $html );
$col=$dom->getElementsByTagName('span');
$keep=array(
'style'=>array(),
'data' =>array(),
'email'=>array()
);
foreach( $col as $node ){
$keep['style'][]=str_replace( "'", "", $node->getAttribute('style') );
$keep['data'][]=$node->nodeValue;
if( $node->hasChildNodes() ){
foreach( $node->childNodes as $child ){
if( $child->nodeType==XML_ELEMENT_NODE && $child->hasAttribute('href') ) {
list($mailto,$address)=explode(':',$child->getAttribute('href') );
$keep['email'][]=$address;
}
}
}
}
echo '<pre>',print_r($keep,true),'</pre>';
/* output
------
Array
(
[style] => Array
(
[0] => background: url(/wp-content/themes/minimum-child/img/address.png) 0px 2px no-repeat; padding-left: 20px;
[1] => background: url(/wp-content/themes/minimum-child/img/email.png) 0px 2px no-repeat; padding-left: 20px;
[2] => background: url(/wp-content/themes/minimum-child/img/tel.png) 0px 2px no-repeat; padding-left: 20px;
)
[data] => Array
(
[0] => CONTENT 1
[1] => CONTENT 2
[2] => CONTENT 3
)
[email] => Array
(
[0] => post@post.com
)
)
*/
我有这些跨度标签:
<div>
<span style="background: url('/wp-content/themes/minimum-child/img/address.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url('/wp-content/themes/minimum-child/img/email.png') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url('/wp-content/themes/minimum-child/img/tel.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>
我需要获取跨度之间的内容,但我需要将内容分隔为单个变量 $address
、$email
、$phone
、$web
,等等。很明显,我可以使用背景图像的名称作为模式,因为图像的名称仍然相同(address.png、email.png 等)
目前我觉得还是需要用到preg_match_all
这个功能,我已经试过了,但是至今没有成功。
我试过(获取 $address
变量的地址):
$url="'/wp-content/themes/minimum-child/img/address.png'";
$tag='span style="background: url('.$url.')';
$matches=array();
$pattern = "/<$tag ?.*>(.*)<\/span>/";
preg_match($pattern, $htmlcontent, $matches);
$address=$matches[1];
不幸的是,它不起作用。你知道如何实现它吗?
人们常说用正则表达式解析 html 充满问题 - 所以我会选择更简单的方法,即使用 DOMDocument
来帮助处理 html片段 - 如果需要,您可以使用正则表达式进一步优化一些结果。
$html='
<div>
<span style="background: url(\'/wp-content/themes/minimum-child/img/address.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url(\'/wp-content/themes/minimum-child/img/email.png\') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url(\'/wp-content/themes/minimum-child/img/tel.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>';
$dom=new DOMDocument;
$dom->loadHTML( $html );
$col=$dom->getElementsByTagName('span');
$keep=array(
'style'=>array(),
'data' =>array(),
'email'=>array()
);
foreach( $col as $node ){
$keep['style'][]=str_replace( "'", "", $node->getAttribute('style') );
$keep['data'][]=$node->nodeValue;
if( $node->hasChildNodes() ){
foreach( $node->childNodes as $child ){
if( $child->nodeType==XML_ELEMENT_NODE && $child->hasAttribute('href') ) {
list($mailto,$address)=explode(':',$child->getAttribute('href') );
$keep['email'][]=$address;
}
}
}
}
echo '<pre>',print_r($keep,true),'</pre>';
/* output
------
Array
(
[style] => Array
(
[0] => background: url(/wp-content/themes/minimum-child/img/address.png) 0px 2px no-repeat; padding-left: 20px;
[1] => background: url(/wp-content/themes/minimum-child/img/email.png) 0px 2px no-repeat; padding-left: 20px;
[2] => background: url(/wp-content/themes/minimum-child/img/tel.png) 0px 2px no-repeat; padding-left: 20px;
)
[data] => Array
(
[0] => CONTENT 1
[1] => CONTENT 2
[2] => CONTENT 3
)
[email] => Array
(
[0] => post@post.com
)
)
*/