在 Puppeteer 中获取所有与 XPath 的链接(暂停或不工作)?
Get all links with XPath in Puppeteer (pausing or not working)?
我需要使用 XPaths 来 select 页面上的所有链接,然后我的 Puppeteer 应用程序才能单击并执行一些操作。我发现该方法(下面的代码)有时会卡住,我的爬虫将暂停。是否有 better/different 从 XPath 获取所有链接的方法?或者我的代码中是否有不正确的地方可能会暂停我的应用程序的进度?
try {
links = await this.getLinksFromXPathSelector(state);
} catch (e) {
console.log("error getting links");
return {...state, error: e};
}
调用:
async getLinksFromXPathSelector(state) {
const newPage = state.page
// console.log('links selector');
const links = await newPage.evaluate((mySelector) => {
let results = [];
let query = document.evaluate(mySelector,
document,
null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
for (let i=0, length=query.snapshotLength; i<length; ++i) {
results.push(query.snapshotItem(i).href);
}
return results;
}, state.linksSelector);
return links;
}
XPath 在 state.linksSelector
.
您可以预先使用page.$x()
to evaluate an XPath expression and obtain an ElementHandle
array. It may be appropriate to use page.waitForXPath()
来确保将XPath字符串指定的元素添加到DOM。
然后您可以为每个元素传递 ElementHandle
array elements to the page context via page.evaluate()
and return an array containing the href
属性值。
const xpath_expression = '//a[@href]';
await page.waitForXPath(xpath_expression);
const links = await page.$x(xpath_expression);
const link_urls = await page.evaluate((...links) => {
return links.map(e => e.href);
}, ...links);
console.log(link_urls);
我需要使用 XPaths 来 select 页面上的所有链接,然后我的 Puppeteer 应用程序才能单击并执行一些操作。我发现该方法(下面的代码)有时会卡住,我的爬虫将暂停。是否有 better/different 从 XPath 获取所有链接的方法?或者我的代码中是否有不正确的地方可能会暂停我的应用程序的进度?
try {
links = await this.getLinksFromXPathSelector(state);
} catch (e) {
console.log("error getting links");
return {...state, error: e};
}
调用:
async getLinksFromXPathSelector(state) {
const newPage = state.page
// console.log('links selector');
const links = await newPage.evaluate((mySelector) => {
let results = [];
let query = document.evaluate(mySelector,
document,
null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
for (let i=0, length=query.snapshotLength; i<length; ++i) {
results.push(query.snapshotItem(i).href);
}
return results;
}, state.linksSelector);
return links;
}
XPath 在 state.linksSelector
.
您可以预先使用page.$x()
to evaluate an XPath expression and obtain an ElementHandle
array. It may be appropriate to use page.waitForXPath()
来确保将XPath字符串指定的元素添加到DOM。
然后您可以为每个元素传递 ElementHandle
array elements to the page context via page.evaluate()
and return an array containing the href
属性值。
const xpath_expression = '//a[@href]';
await page.waitForXPath(xpath_expression);
const links = await page.$x(xpath_expression);
const link_urls = await page.evaluate((...links) => {
return links.map(e => e.href);
}, ...links);
console.log(link_urls);