对于给定的大数据字符串,如何在 href= value 的最后一个斜线及其 alt= value 之后获取数据?
how to get data after last slash of href= value and its alt= value for given big data string?
我有一个混合了两种类型数据集的大字符串。我想获取所有 href 值(例如 168702 和 167504)中最后一个斜杠后的数据及其对应的 alt=
值(即第 29 集和第 20 集)。我尝试了以下方法,但无法获得正确的数据。
preg_match_all('/<a class=\"asite-thumbnail\" href="(.*?)"/s', $code2, $foo);
print_r($foo[1]);
第一个数据集类型:
<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>
第二个数据集类型:
<a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">
以下是使用 domdocument 完成此操作的方法...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a'); // pull all links
foreach ($links as $link) { //loop through each link
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n"; //strip down the url to all content after the last /
$images = $link->getElementsByTagName('img');//get all images in the link
foreach($images as $image) { //loop through all links
echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n"; // output the alt attributes content
}
}
输出:
End of Link=168702
Alt attribute = Episode 29
演示:https://regex101.com/r/eW0zI1/1
...或使用两个数据集...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n";
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n";
}
}
End of Link=168702
Alt attribute = Episode 29
End of Link=167504
Alt attribute = episode 20
更新:
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504"><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
$linkimage['endlink'][] = preg_replace('~^.*/~', '', $link->getAttribute('href'));
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
$linkimage['alt'][] = $image->getAttribute('alt');
}
}
print_r($linkimage);
输出:
Array
(
[endlink] => Array
(
[0] => 168702
[1] => 167504
)
[alt] => Array
(
[0] => Episode 29
[1] => episode 20
)
)
我有一个混合了两种类型数据集的大字符串。我想获取所有 href 值(例如 168702 和 167504)中最后一个斜杠后的数据及其对应的 alt=
值(即第 29 集和第 20 集)。我尝试了以下方法,但无法获得正确的数据。
preg_match_all('/<a class=\"asite-thumbnail\" href="(.*?)"/s', $code2, $foo);
print_r($foo[1]);
第一个数据集类型:
<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>
第二个数据集类型:
<a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">
以下是使用 domdocument 完成此操作的方法...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a'); // pull all links
foreach ($links as $link) { //loop through each link
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n"; //strip down the url to all content after the last /
$images = $link->getElementsByTagName('img');//get all images in the link
foreach($images as $image) { //loop through all links
echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n"; // output the alt attributes content
}
}
输出:
End of Link=168702
Alt attribute = Episode 29
演示:https://regex101.com/r/eW0zI1/1
...或使用两个数据集...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n";
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n";
}
}
End of Link=168702
Alt attribute = Episode 29
End of Link=167504
Alt attribute = episode 20
更新:
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504"><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
$linkimage['endlink'][] = preg_replace('~^.*/~', '', $link->getAttribute('href'));
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
$linkimage['alt'][] = $image->getAttribute('alt');
}
}
print_r($linkimage);
输出:
Array
(
[endlink] => Array
(
[0] => 168702
[1] => 167504
)
[alt] => Array
(
[0] => Episode 29
[1] => episode 20
)
)