driver.findelements(By.xpath) 使用 Selenium Java 在 https://www.amazon.com/ 上显示不一致的搜索结果
driver.findelements(By.xpath) shows inconsistent search results on https://www.amazon.com/ using Selenium Java
我正在尝试获取此亚马逊页面前 3 页上正在销售的每台笔记本电脑的 URL
每次我 运行 脚本时,driver.findElements(By.xpath) returns 的 URLs 数量不一致。第一页非常一致,它 return 4 URLs 但是第 2 页和第 3 页可以 return 在 1 到 4 URLs 之间的任何地方,即使第 2 页有 8 URLs 我正在寻找,第 3 页有 4 URLs 我正在寻找。
我怀疑问题出在 grabData 方法中,因为它根据给定的不一致 URLs 列表抓取数据。我对此很陌生,所以我希望一切都有意义。任何帮助,将不胜感激。如果您需要更多说明,请告诉我
public static String dealURLsXpath = "//span[@data-a-strike=\"true\" or contains(@class,\"text-strike\")][.//text()]/parent::a[@class]";
public static List<String> URLs = new ArrayList<String>();
public static void main(String[] args)
{
//Initialize Browser
System.setProperty("webdriver.chrome.driver", "C:\Users\email\eclipse-workspace\ChromeDriver 81\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
//Search through laptops and starts at page 1
Search.searchLaptop(driver);
//Grabs data for each deal and updates Products List directly
listingsURL = driver.getCurrentUrl();
//updates the global URLs List with the URLs found by driver.findElements(By.xpath)
updateURLsList(driver);
//Iterates through each URL and grabs laptop information to add to products list
grabData(driver, URLs, "Laptop");
// Clears URLs list so that it can be populated by the URLs in the next page
URLs.clear();
// returns driver to Amazon page to click on "page 2" button to go to next page and repeat process
driver.get(listingsURL);
driver.findElement(By.xpath("//a [contains(@href,'pg_2')]")).click();
listingsURL = driver.getCurrentUrl();
updateURLsList(driver);
grabData(driver, URLs, "Laptop");
URLs.clear();
driver.get(listingsURL);
driver.findElement(By.xpath("//a [contains(@href,'pg_3')]")).click();
listingsURL = driver.getCurrentUrl();
updateURLsList(driver);
grabData(driver, URLs, "Laptop");
URLs.clear();
driver.get(listingsURL);
}
public static void updateURLsList(WebDriver driver)
{
//list of deals on amazon page
/////////////////////////////////////////////INCONSISTENT/////////////////////////////////////////////
List<WebElement> deals = driver.findElements(By.xpath(dealURLsXpath));
//////////////////////////////////////////////////////////////////////////////////////////////////////
System.out.println("Deals Size: " + deals.size());
for(WebElement element : deals)
{
URLs.add(element.getAttribute("href"));
}
System.out.println("URL List size: " + URLs.size());
deals.clear();
}
public static void grabData(WebDriver driver, List<String> URLs, String category)
{
for(String url : URLs)
{
driver.get(url);
String name = driver.findElement(By.xpath("//span [@id = \"productTitle\"]")).getText();
System.out.println("Name: " + name);
String price = driver.findElement(By.xpath("//span [@id = \"priceblock_ourprice\"]")).getText();
System.out.println("price: " + price);
String Xprice = driver.findElement(By.xpath("//span [@class = \"priceBlockStrikePriceString a-text-strike\"]")).getText();
System.out.println("Xprice: " + Xprice);
String picURL = driver.findElement(By.xpath("//img [@data-old-hires]")).getAttribute("src");
System.out.println("picURL: " + picURL);
BufferedImage img;
System.out.println("URL: " + url);
try
{
img = ImageIO.read(new URL(picURL));
products.add(new Product(
name,
Integer.parseInt(price.replaceAll("[^\d.]", "").replace(".", "").replace(",", "")),
Integer.parseInt(Xprice.replaceAll("[^\d.]", "").replace(".", "").replace(",", "")),
img,
category,
url));
}
catch(IOException e)
{
System.out.println("Error: " + e.getMessage());
}
}
您应该尝试以 selenium 方式使用等待:
WebDriverWait wait = new WebDriverWait(driver, 20);
List<WebElement> deals = wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath(dealURLsXpath)));
要获取本 Amazon page you need to induce WebDriverWait for the visibilityOfAllElementsLocatedBy()
and you can use the following 前 3 页上正在销售的每台笔记本电脑的 href 属性:
代码块:
driver.get("https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1");
List<WebElement> deals = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.xpath("//span[@class='a-price a-text-price']//parent::a[1]")));
for(WebElement deal:deals)
System.out.println(deal.getAttribute("href"));
控制台输出:
https://www.amazon.com/Apple-MacBook-13-inch-256GB-Storage/dp/B08636NKF8/ref=sr_1_2?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-2
https://www.amazon.com/Apple-MacBook-16-Inch-512GB-Storage/dp/B081FZV45H/ref=sr_1_5?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-5
https://www.amazon.com/Apple-MacBook-13-inch-128GB-Storage/dp/B07V49KGVQ/ref=sr_1_9?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-9
https://www.amazon.com/New-Microsoft-Surface-Pro-Touch-Screen/dp/B07YNHXX8D/ref=sr_1_23?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-23
类似地,第 2 页 给出 4 和 第 3 页 给出 4 个网址。
我正在尝试获取此亚马逊页面前 3 页上正在销售的每台笔记本电脑的 URL
每次我 运行 脚本时,driver.findElements(By.xpath) returns 的 URLs 数量不一致。第一页非常一致,它 return 4 URLs 但是第 2 页和第 3 页可以 return 在 1 到 4 URLs 之间的任何地方,即使第 2 页有 8 URLs 我正在寻找,第 3 页有 4 URLs 我正在寻找。
我怀疑问题出在 grabData 方法中,因为它根据给定的不一致 URLs 列表抓取数据。我对此很陌生,所以我希望一切都有意义。任何帮助,将不胜感激。如果您需要更多说明,请告诉我
public static String dealURLsXpath = "//span[@data-a-strike=\"true\" or contains(@class,\"text-strike\")][.//text()]/parent::a[@class]";
public static List<String> URLs = new ArrayList<String>();
public static void main(String[] args)
{
//Initialize Browser
System.setProperty("webdriver.chrome.driver", "C:\Users\email\eclipse-workspace\ChromeDriver 81\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
//Search through laptops and starts at page 1
Search.searchLaptop(driver);
//Grabs data for each deal and updates Products List directly
listingsURL = driver.getCurrentUrl();
//updates the global URLs List with the URLs found by driver.findElements(By.xpath)
updateURLsList(driver);
//Iterates through each URL and grabs laptop information to add to products list
grabData(driver, URLs, "Laptop");
// Clears URLs list so that it can be populated by the URLs in the next page
URLs.clear();
// returns driver to Amazon page to click on "page 2" button to go to next page and repeat process
driver.get(listingsURL);
driver.findElement(By.xpath("//a [contains(@href,'pg_2')]")).click();
listingsURL = driver.getCurrentUrl();
updateURLsList(driver);
grabData(driver, URLs, "Laptop");
URLs.clear();
driver.get(listingsURL);
driver.findElement(By.xpath("//a [contains(@href,'pg_3')]")).click();
listingsURL = driver.getCurrentUrl();
updateURLsList(driver);
grabData(driver, URLs, "Laptop");
URLs.clear();
driver.get(listingsURL);
}
public static void updateURLsList(WebDriver driver)
{
//list of deals on amazon page
/////////////////////////////////////////////INCONSISTENT/////////////////////////////////////////////
List<WebElement> deals = driver.findElements(By.xpath(dealURLsXpath));
//////////////////////////////////////////////////////////////////////////////////////////////////////
System.out.println("Deals Size: " + deals.size());
for(WebElement element : deals)
{
URLs.add(element.getAttribute("href"));
}
System.out.println("URL List size: " + URLs.size());
deals.clear();
}
public static void grabData(WebDriver driver, List<String> URLs, String category)
{
for(String url : URLs)
{
driver.get(url);
String name = driver.findElement(By.xpath("//span [@id = \"productTitle\"]")).getText();
System.out.println("Name: " + name);
String price = driver.findElement(By.xpath("//span [@id = \"priceblock_ourprice\"]")).getText();
System.out.println("price: " + price);
String Xprice = driver.findElement(By.xpath("//span [@class = \"priceBlockStrikePriceString a-text-strike\"]")).getText();
System.out.println("Xprice: " + Xprice);
String picURL = driver.findElement(By.xpath("//img [@data-old-hires]")).getAttribute("src");
System.out.println("picURL: " + picURL);
BufferedImage img;
System.out.println("URL: " + url);
try
{
img = ImageIO.read(new URL(picURL));
products.add(new Product(
name,
Integer.parseInt(price.replaceAll("[^\d.]", "").replace(".", "").replace(",", "")),
Integer.parseInt(Xprice.replaceAll("[^\d.]", "").replace(".", "").replace(",", "")),
img,
category,
url));
}
catch(IOException e)
{
System.out.println("Error: " + e.getMessage());
}
}
您应该尝试以 selenium 方式使用等待:
WebDriverWait wait = new WebDriverWait(driver, 20);
List<WebElement> deals = wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath(dealURLsXpath)));
要获取本 Amazon page you need to induce WebDriverWait for the visibilityOfAllElementsLocatedBy()
and you can use the following
代码块:
driver.get("https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1"); List<WebElement> deals = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.xpath("//span[@class='a-price a-text-price']//parent::a[1]"))); for(WebElement deal:deals) System.out.println(deal.getAttribute("href"));
控制台输出:
https://www.amazon.com/Apple-MacBook-13-inch-256GB-Storage/dp/B08636NKF8/ref=sr_1_2?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-2 https://www.amazon.com/Apple-MacBook-16-Inch-512GB-Storage/dp/B081FZV45H/ref=sr_1_5?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-5 https://www.amazon.com/Apple-MacBook-13-inch-128GB-Storage/dp/B07V49KGVQ/ref=sr_1_9?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-9 https://www.amazon.com/New-Microsoft-Surface-Pro-Touch-Screen/dp/B07YNHXX8D/ref=sr_1_23?dchild=1&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590134317&refinements=p_72%3A1248879011&s=pc&sr=1-23
类似地,第 2 页 给出 4 和 第 3 页 给出 4 个网址。