preg_match_all 查找链接，删除相同的结果？

Question

我的匹配结果有问题，这是我的脚本，找不到如何从抓取的内容中添加 link 并避免相同的结果？？我只需要以 http://www.autogidas.lt/ ....

开头的结果

 <?
 $id= $_GET['id'];
 $user= $_GET['user'];
 $login=$_COOKIE['login'];

 $query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from autogidas where vartotojas='$user' and id='$id'");
 $rezultatas=mysql_fetch_row($query);

 $url = "$rezultatas[1]";

 $info = file_get_contents($url); 

 function scrape_between($data, $start, $end){
 $data = stristr($data, $start); 
 $data = substr($data, strlen($start));
 $stop = stripos($data, $end);
 $data = substr($data, 0, $stop);
 return str_replace('  ', ' ', $data);
 }
 $contents = scrape_between($info, "<table border=\"0\" cellspacing=\"0\">", "</table>");

   preg_match_all('/<span class="ttitle2".*?>(.*?)<\/span>/',$contents,$pavadinimas); 

   preg_match_all('/<span class="ttitle3".*?>(.*?)<\/span>/',$contents,$miestas); 

   preg_match_all('/<span class="ttitle1".*?>(.*?)<\/span>/',$contents,$metai_kaina); 

   foreach($metai_kaina[0] as $key=>$metai_kaina_val){ 
   if($key%2==0)
   $metai[] = strip_tags($metai_kaina_val);
   else  
   $kaina[] = strip_tags($metai_kaina_val);  
   }

   preg_match_all('/<img .*?(?=src)src=\"([^\"]+)\"/si', $contents, $img_link);
   preg_match_all('/<a href="http:\/\/www.autogidas.lt(.*?)"/s', $contents, $matches);

   for($i=0; $i<count($pavadinimas[0]); $i++){
    echo '<tr>
      <td><a href='HERE I NEED LINKS'><img src="'.$img_link[1][$i].'"></a></td>
      <td>'.$pavadinimas[0][$i].'</td>
      <td>'.$miestas[0][$i].'</td>
      <td>'.$metai[$i].'</td>
      <td><center>'.$kaina[$i].'</center></td>
    </tr>';
    }

   echo "</table>";
   ?>

我尝试了一些帮助，但不知道如何更新脚本，这是我需要的最后一件事，但找不到如何做...我不是专业人士，我只是为了好玩而学习自己 php，感谢帮助！！！抱歉我的英语不好....

Answer 1

您可以使用此代码：

// RegEx to only match with http://www.address.com/* kind of URLs in anchors
$regexp = "<a\s[^>]*href=(\"??)(http\:\/\/www\.adress\.com\/[^\" >]*?)\1[^>]*>(.*)<\/a>";
if (preg_match_all("/$regexp/siU", $svetaines_turinys, $matches, PREG_SET_ORDER)) {
    // collect results in array
    $arr = [];
    foreach($matches as $match) {
        $arr[] = $match[2];
    }
    // remove duplicates from it
    $arr = array_unique($arr);
    // send to client
    foreach($arr as $match) {
        echo "$match <BR/>";
    }
}

在对原始问题进行更改后进行编辑：

您想要获得唯一的超链接，因为在您抓取的页面上使用了两次相同的超链接。但是两者的出现方式并不完全相同，只有两者之一后跟一个 img 标记，因此您可以更改获取 $matches 的正则表达式，如下所示：

preg_match_all('/<a href="(http:\/\/www.autogidas.lt[^"]*)"\s*>\s*<img/s',
    $contents, $matches);

请注意，在上面的正则表达式中，我还移动了左括号以匹配整个 url，这正是您在下面的代码中所需要的。

然后在你的循环中，你可以在你引用的字符串中输出这个片段的超链接：

    <a href="'.$matches[1][$i].'">

注意：您的代码应该以 <?php 开始，而不仅仅是 <?

preg_match_all 查找链接，删除相同的结果？

preg_match_all find links, remove same results?

php

regex

preg-match-all