对于给定的大数据字符串，如何在 href= value 的最后一个斜线及其 alt= value 之后获取数据？

Question

我有一个混合了两种类型数据集的大字符串。我想获取所有 href 值（例如 168702 和 167504）中最后一个斜杠后的数据及其对应的 alt= 值（即第 29 集和第 20 集）。我尝试了以下方法，但无法获得正确的数据。

  preg_match_all('/<a  class=\"asite-thumbnail\" href="(.*?)"/s', $code2, $foo);
print_r($foo[1]);

第一个数据集类型：

  <a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>

第二个数据集类型：

<a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">

Answer 1

以下是使用 domdocument 完成此操作的方法...

$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a'); // pull all links
foreach ($links as $link) { //loop through each link
    echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n"; //strip down the url to all content after the last /
    $images = $link->getElementsByTagName('img');//get all images in the link
    foreach($images as $image) { //loop through all links
        echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n"; // output the alt attributes content
    }
}

输出：

End of Link=168702
Alt attribute = Episode 29

演示：https://regex101.com/r/eW0zI1/1

...或使用两个数据集...

$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
    echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "\n";
    $images = $link->getElementsByTagName('img');
    foreach($images as $image) {
        echo 'Alt attribute = ' . $image->getAttribute('alt') . "\n";
    }
}

End of Link=168702
Alt attribute = Episode 29
End of Link=167504
Alt attribute = episode 20

更新：

$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504"><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
    $linkimage['endlink'][] = preg_replace('~^.*/~', '', $link->getAttribute('href'));
    $images = $link->getElementsByTagName('img');
    foreach($images as $image) {
        $linkimage['alt'][] = $image->getAttribute('alt');
    }
}
print_r($linkimage);

输出：

Array
(
    [endlink] => Array
        (
            [0] => 168702
            [1] => 167504
        )

    [alt] => Array
        (
            [0] => Episode 29
            [1] => episode 20
        )

)

对于给定的大数据字符串，如何在 href= value 的最后一个斜线及其 alt= value 之后获取数据？

how to get data after last slash of href= value and its alt= value for given big data string?

php

regex

parsing

preg-match-all