无法在 puppeteer 中使用 xpath 将 link 抓取到下一页
Failed to scrape the link to the next page using xpath in puppeteer
我正在尝试将 link 抓取到下一页 webpage. I know how to scrape that using css selector. However, things go wrong when I attempt to parse the same using xpath. This 是我得到的而不是下一页 link。
const puppeteer = require("puppeteer");
let url = "https://whosebug.com/questions/tagged/web-scraping";
(async () => {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(url,{waitUntil: 'networkidle2'});
let nextPageLink = await page.$x("//a[@rel='next']", item => item.getAttribute("href"));
// let nextPageLink = await page.$eval("a[rel='next']", elm => elm.href);
console.log("next page:",nextPageLink);
await browser.close();
})();
How can I scrape the link to the next page using xpath?
page.$x(expression)
returns 元素句柄数组。您需要解构或索引访问才能从数组中获取第一个元素。
- 要从此元素句柄中获取 DOM 元素 属性,您需要使用元素句柄参数或元素句柄 API.
进行评估
const [nextPageLink] = await page.$x("//a[@rel='next']");
const nextPageURL = await nextPageLink.evaluate(link => link.href);
或者:
const [nextPageLink] = await page.$x("//a[@rel='next']");
const nextPageURL = await (await nextPageURL.getProperty('href')).jsonValue();
我正在尝试将 link 抓取到下一页 webpage. I know how to scrape that using css selector. However, things go wrong when I attempt to parse the same using xpath. This 是我得到的而不是下一页 link。
const puppeteer = require("puppeteer");
let url = "https://whosebug.com/questions/tagged/web-scraping";
(async () => {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(url,{waitUntil: 'networkidle2'});
let nextPageLink = await page.$x("//a[@rel='next']", item => item.getAttribute("href"));
// let nextPageLink = await page.$eval("a[rel='next']", elm => elm.href);
console.log("next page:",nextPageLink);
await browser.close();
})();
How can I scrape the link to the next page using xpath?
page.$x(expression)
returns 元素句柄数组。您需要解构或索引访问才能从数组中获取第一个元素。- 要从此元素句柄中获取 DOM 元素 属性,您需要使用元素句柄参数或元素句柄 API. 进行评估
const [nextPageLink] = await page.$x("//a[@rel='next']");
const nextPageURL = await nextPageLink.evaluate(link => link.href);
或者:
const [nextPageLink] = await page.$x("//a[@rel='next']");
const nextPageURL = await (await nextPageURL.getProperty('href')).jsonValue();