PHP 从 html 文本中获取 <h[1-6]></h[1-6]> 值
PHP get the <h[1-6]></h[1-6]> values from an html text
在我的代码中,我有以下正则表达式:
preg_match_all('/<title>([^>]*)<\/title>/si', $contents, $match );
从网页中检索 <h>..</h>
标签。但有时它可能有 html 标签,例如 <strong>
、<b>
等等,因此它需要一些修改,所以我尝试了这个
preg_match_all('/<h[1-6]>(.*)<\/h[1-6]>/si', $contents, $match );
但是出了点问题,无法检索 html <h>
标签中的内容。
你能帮我正确修改正则表达式吗?
当使用 (.*)
时,您可以获取所有内容,仅单词、数字和 space,也许您可以使用它们的范围并获取一个或多个:
preg_match_all('/<h[1-6]>([\w\d\s]+)<\/h[1-6]>/si', $contents, $match);
现在,这里没有正则表达式专家,但他应该站在你的立场上吗?他会这样做:
<?php
// SIMULATED SAMPLE HTML CONENT - WITH ATTRIBUTES:
$contents = '<section id="id-1">And even when darkness covers your path and no one is there to lend a hand;
<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>
<div>Now; let no one deceive you: <h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2></div>
<article>But hang on because You are the Voice... You are the Light and you shall rule your Destiny because it is all about<h6 class="class4">YOU - THE REAL YOU!!!</h6></article>
</section>';
// SPLIT THE CONTENT AT THE END OF EACH <h[1-6]> TAGS
$parts = preg_split("%<\/h[1-6]>%si", $contents);
$matches = array();
// LOOP THROUGH $parts AND BUNDLE APPROPRIATE ELEMENTS TO THE $matches ARRAY.
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "/>", $part);
}
}
var_dump($matches);
//DUMPS::::
array (size=3)
0 => string '<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>' (length=168)
1 => string '<h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2>' (length=89)
2 => string '<h6 class="class4">YOU - THE REAL YOU!!!</h6>' (length=45)
作为函数,归结为:
<?php
function pseudoMatchHTags($htmlContentWithHTags){
$parts = preg_split("%<\/h[1-6]>%si", $htmlContentWithHTags);
$matches = array();
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "/>", $part);
}
}
return $matches;
}
var_dump(pseudoMatchHTags($contents));
您可以在这里进行测试:https://eval.in/571312 ...也许它有点帮助...我希望... ;-)
preg_match_all('<h\d>', $contents, $matches);
foreach($matches as $match){
$num[] = substr ( $match , 1 , 1 );
}
在我的代码中,我有以下正则表达式:
preg_match_all('/<title>([^>]*)<\/title>/si', $contents, $match );
从网页中检索 <h>..</h>
标签。但有时它可能有 html 标签,例如 <strong>
、<b>
等等,因此它需要一些修改,所以我尝试了这个
preg_match_all('/<h[1-6]>(.*)<\/h[1-6]>/si', $contents, $match );
但是出了点问题,无法检索 html <h>
标签中的内容。
你能帮我正确修改正则表达式吗?
当使用 (.*)
时,您可以获取所有内容,仅单词、数字和 space,也许您可以使用它们的范围并获取一个或多个:
preg_match_all('/<h[1-6]>([\w\d\s]+)<\/h[1-6]>/si', $contents, $match);
现在,这里没有正则表达式专家,但他应该站在你的立场上吗?他会这样做:
<?php
// SIMULATED SAMPLE HTML CONENT - WITH ATTRIBUTES:
$contents = '<section id="id-1">And even when darkness covers your path and no one is there to lend a hand;
<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>
<div>Now; let no one deceive you: <h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2></div>
<article>But hang on because You are the Voice... You are the Light and you shall rule your Destiny because it is all about<h6 class="class4">YOU - THE REAL YOU!!!</h6></article>
</section>';
// SPLIT THE CONTENT AT THE END OF EACH <h[1-6]> TAGS
$parts = preg_split("%<\/h[1-6]>%si", $contents);
$matches = array();
// LOOP THROUGH $parts AND BUNDLE APPROPRIATE ELEMENTS TO THE $matches ARRAY.
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "/>", $part);
}
}
var_dump($matches);
//DUMPS::::
array (size=3)
0 => string '<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>' (length=168)
1 => string '<h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2>' (length=89)
2 => string '<h6 class="class4">YOU - THE REAL YOU!!!</h6>' (length=45)
作为函数,归结为:
<?php
function pseudoMatchHTags($htmlContentWithHTags){
$parts = preg_split("%<\/h[1-6]>%si", $htmlContentWithHTags);
$matches = array();
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "/>", $part);
}
}
return $matches;
}
var_dump(pseudoMatchHTags($contents));
您可以在这里进行测试:https://eval.in/571312 ...也许它有点帮助...我希望... ;-)
preg_match_all('<h\d>', $contents, $matches);
foreach($matches as $match){
$num[] = substr ( $match , 1 , 1 );
}