preg_match_all 如何获取所有链接？

Question

我正在尝试从我正在抓取的页面获取所有带有 preg_match_all 的图像链接，这些链接以 http://i.ebayimg.com/ 开头并以 .jpg 结尾。我无法正确执行... :( 我试过了，但这不是我需要的...:[=16=]

preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $contentas, $img_link);

普通链接也有同样的问题...我不知道如何写preg_match_all：

<a class="link--muted" href="http://suchen.mobile.de/fahrzeuge/details.html?id=218930381&daysAfterCreation=7&isSearchRequest=true&withImage=true&scopeId=C&categories=Limousine&damageUnrepaired=NO_DAMAGE_UNREPAIRED&zipcode=&fuels=DIESEL&ambitCountry=DE&maxPrice=11000&minFirstRegistrationDate=2006-01-01&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=20&pageNumber=1" data-touch="hover" data-touch-wrapper=".cBox-body--resultitem">

非常感谢！！！

更新我从这里尝试： http://suchen.mobile.de/fahrzeuge/search.html?isSearchRequest=true&scopeId=C&makeModelVariant1.makeId=1900&makeModelVariant1.modelId=10&makeModelVariant1.modelDescription=&makeModelVariantExclusions%5B0%5D.makeId=&categories=Limousine&minSeats=&maxSeats=&doorCount=&minFirstRegistrationDate=2006-01-01&maxFirstRegistrationDate=&minMileage=&maxMileage=&minPrice=&maxPrice=11000&minPowerAsArray=&maxPowerAsArray=&maxPowerAsArray=PS&minPowerAsArray=PS&fuels=DIESEL&minCubicCapacity=&maxCubicCapacity=&ambitCountry=DE&zipcode=&q=&climatisation=&airbag=&daysAfterCreation=7&withImage=true&adLimitation=&export=&vatable=&maxConsumptionCombined=&emissionClass=&emissionsSticker=&damageUnrepaired=NO_DAMAGE_UNREPAIRED&numberOfPreviousOwners=&minHu=&usedCarSeals= 获取汽车链接和图像链接以及所有信息，有信息就一切正常，我的脚本运行良好，但我在抓取图像和链接时遇到问题。这是我的脚本：

<?php

        $id= $_GET['id'];
        $user= $_GET['user'];
        $login=$_COOKIE['login'];

    $query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from mobile where vartotojas='$user' and id='$id'");
    $rezultatas=mysql_fetch_row($query);

    $url = "$rezultatas[1]";

    $info = file_get_contents($url); 

function scrape_between($data, $start, $end){
$data = stristr($data, $start); 
$data = substr($data, strlen($start));
$stop = stripos($data, $end);
$data = substr($data, 0, $stop);
return $data;
  }
     //turinio iskirpimas
    $turinys = scrape_between($info, '<div class="g-col-9">', '<footer class="footer">');
     //filtravimas naikinami mokami top skelbimai
    $contentas = preg_replace('/<div class="cBox-body cBox-body--topResultitem".*?>(.*?)<\/div>/', '' ,$turinys);
    //filtravimas baigtas

      preg_match_all('/<span class="h3".*?>(.*?)<\/span>/',$contentas,$pavadinimas); 

      preg_match_all('/<span class="u-block u-pad-top-9 rbt-onlineSince".*?>(.*?)<\/span>/',$contentas,$data); 

      preg_match_all('/<span class="u-block u-pad-top-9".*?>(.*?)<\/span>/',$contentas,$miestas);

      preg_match_all('/<span class="h3 u-block".*?>(.*?)<\/span>/', $contentas, $kaina);

      preg_match_all('/<a[A-z0-9-_:="\.\/ ]+href="(http:\/\/suchen.mobile.de\/fahrzeuge\/[^"]*)"[A-z0-9-_:="\.\/ ]\s*>\s*<div/s', $contentas, $matches);

   print_r($pavadinimas);
   print_r($data);
   print_r($miestas);
   print_r($kaina);
   print_r($result);
   print_r($matches);

   ?>

Answer 1

1. 捕获所有 img 标签的 http://i.ebayimg.com/ 开始的 src 属性：

正则表达式：/src=\"((?:http|https):\/\/i.ebayimg.com\/.+?.jpg)\"/i

这是一个例子：

$re = "/src=\"((?:http|https):\/\/i.ebayimg.com\/.+?.jpg)\"/i"; 
$str = "codeOfHTMLPage"; 
preg_match_all($re, $str, $matches);

现场查看：here

如果您想确保在 img 标签上捕获此 url，请使用此正则表达式 （请记住，性能会降低如果页面很长）：

$re = "/<img(?:.*?)src=\"((?:http|https):\/\/i.ebayimg.com\/.+?.jpg)\"/i";

2. 捕获所有 a 标签中以 http://i.ebayimg.com/ 开头的 href 属性：

正则表达式：/href=\"((?:http|https):\/\/suchen.mobile.de\/fahrzeuge\/.+?.jpg)\"/i

这是一个例子：

$re = "/href=\"((?:http|https):\/\/suchen.mobile.de\/fahrzeuge\/.+?.jpg)\"/i; 
$str = "codeOfHTMLPage"; 
preg_match_all($re, $str, $matches);

现场查看：here

如果您想确保在 a 标签上捕获此 url，请使用此正则表达式 （请记住，性能会降低如果页面很长）：

$re = "/<a(?:.*?)href=\"((?:http|https):\/\/suchen.mobile.de\/fahrzeuge\/.+?.jpg)\"/i";

Answer 2

使用 DOMDocument 更方便：

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($yourURL);

$imgNodes = $dom->getElementsByTagName('img');

$result = [];

foreach ($imgNodes as $imgNode) {
    $src = $imgNode->getAttribute('src');
    $urlElts = parse_url($src);
    $ext = strtolower(array_pop(explode('.', $urlElts['path'])));
    if ($ext == 'jpg' && $urlElts['host'] == 'i.ebayimg.com')
        $result[] = $src;
}

print_r($result);

要获取 "normal" 链接，请使用相同的方法 (DOMDocument + parse_url)。

preg_match_all 如何获取所有链接？

preg_match_all How to get all links?

php

regex

preg-match-all

web-scraping