使用 Puppeteer 单击主链接并单击子链接?
Using Puppeteer to click main links and clicking sub-links?
简化:
我有一个带有 link 的网站。
单击每个 link 后,它会转到一个新页面,我需要访问 links(通过 clicking,而不是导航)。
可视化:
我已经完成了 99% 的工作:
(async () =>
{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
let url = "https://www.mutualart.com/Artists";
console.log(`Fetching page data for : ${url}...`);
await page.goto(url);
await page.waitForSelector(".item.col-xs-3");
let arrMainLinks: ElementHandle[] = await page.$$('.item.col-xs-3 > a'); //get the main links
console.log(arrMainLinks.length); // 16
for (let mainLink of arrMainLinks) //foreach main link let's click it
{
let hrefValue =await (await mainLink.getProperty('href')).jsonValue();
console.log("Clicking on " + hrefValue);
await Promise.all([
page.waitForNavigation(),
mainLink.click({delay: 100})
]);
// let's get the sub links
let arrSubLinks: ElementHandle[] = await page.$$('.slide >a');
//let's click on each sub click
for (let sublink of arrSubLinks)
{
console.log('██AAA');
await Promise.all([
page.waitForNavigation(),
sublink.click({delay: 100})
]);
console.log('██BBB');
// await page.goBack()
break; // for now ...
}
break;
}
await browser.close();
})();
那么问题出在哪里呢?
它达到了 ██AAA
但它从未达到 ██BBB
我得到一个错误:
C:\temp\puppeterr1\app>node server2.js
Fetching page data for : https://www.mutualart.com/Artists...
16
Clicking on https://www.mutualart.com/Artist/Mr--Brainwash/9B3FED6BB81E6B8E
██AAA
(node:17200) UnhandledPromiseRejectionWarning: TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (C:\temp\puppeterr1\node_modules\puppeteer\lib\FrameManager.js:1230:21)
at <anonymous>
-- ASYNC --
at Frame.<anonymous> (C:\temp\puppeterr1\node_modules\puppeteer\lib\helper.js:144:27)
at Page.waitForNavigation (C:\temp\puppeterr1\node_modules\puppeteer\lib\Page.js:599:49)
at Page.<anonymous> (C:\temp\puppeterr1\node_modules\puppeteer\lib\helper.js:145:23)
at Object.<anonymous> (C:\temp\puppeterr1\app\server2.js:127:30)
at step (C:\temp\puppeterr1\app\server2.js:32:23)
at Object.next (C:\temp\puppeterr1\app\server2.js:13:53)
at fulfilled (C:\temp\puppeterr1\app\server2.js:4:58)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
(node:17200) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:17200) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
问题:
我在这里错过了什么?
为什么它不能到达 ██BBB?
更新:
https://github.com/GoogleChrome/puppeteer/issues/3535
原答案:
更新,我已经设法解决了它,但不是通过我想要的常规方式。
ElementHandle
好像有问题。这就是我转向纯 DOM 对象的原因。
我仍然对更直观的解决方案感兴趣,而不是处理 ElementHandle :
无论如何这是我的解决方案:
(async () =>
{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
let url = "https://www.mutualart.com/Artists";
console.log(`Fetching page data for : ${url}...`);
await page.goto(url);
await page.waitForSelector(".item.col-xs-3");
let arrMainLinks = await page.evaluate(() =>
{
return Array.from(document.querySelectorAll('.item.col-xs-3 > a'));
});
console.log(arrMainLinks.length);
for (let i = 0; i < arrMainLinks.length; i++) //get the main links
{
await page.evaluate((a) =>
{
return ([...document.querySelectorAll('.item.col-xs-3 > a')][a] as HTMLElement ).click();
}, i);
await page.waitForNavigation();
let arrSubLinks2 = await page.evaluate(() =>
{
return Array.from(document.querySelectorAll('.slide>a'));
});
console.log(arrSubLinks2.length);
for (let j = 0; j < arrSubLinks2.length; j++)
{
console.log('███AAA');
await page.evaluate((a) =>
{
return ([...document.querySelectorAll('.slide>a')][a] as HTMLElement) .click();
}, j);
await page.waitForNavigation();
let ddd: ElementHandle[] = await page.$$('.artist-name');
console.log(ddd.length);
console.log('███BBB');
await page.waitFor(2000);
await page.goBack();
console.log('███CCC');
}
await page.waitFor(2000);
await page.goBack();
}
await browser.close();
})();
简化:
我有一个带有 link 的网站。
单击每个 link 后,它会转到一个新页面,我需要访问 links(通过 clicking,而不是导航)。
可视化:
我已经完成了 99% 的工作:
(async () =>
{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
let url = "https://www.mutualart.com/Artists";
console.log(`Fetching page data for : ${url}...`);
await page.goto(url);
await page.waitForSelector(".item.col-xs-3");
let arrMainLinks: ElementHandle[] = await page.$$('.item.col-xs-3 > a'); //get the main links
console.log(arrMainLinks.length); // 16
for (let mainLink of arrMainLinks) //foreach main link let's click it
{
let hrefValue =await (await mainLink.getProperty('href')).jsonValue();
console.log("Clicking on " + hrefValue);
await Promise.all([
page.waitForNavigation(),
mainLink.click({delay: 100})
]);
// let's get the sub links
let arrSubLinks: ElementHandle[] = await page.$$('.slide >a');
//let's click on each sub click
for (let sublink of arrSubLinks)
{
console.log('██AAA');
await Promise.all([
page.waitForNavigation(),
sublink.click({delay: 100})
]);
console.log('██BBB');
// await page.goBack()
break; // for now ...
}
break;
}
await browser.close();
})();
那么问题出在哪里呢?
它达到了 ██AAA
但它从未达到 ██BBB
我得到一个错误:
C:\temp\puppeterr1\app>node server2.js
Fetching page data for : https://www.mutualart.com/Artists...
16
Clicking on https://www.mutualart.com/Artist/Mr--Brainwash/9B3FED6BB81E6B8E
██AAA
(node:17200) UnhandledPromiseRejectionWarning: TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (C:\temp\puppeterr1\node_modules\puppeteer\lib\FrameManager.js:1230:21)
at <anonymous>
-- ASYNC --
at Frame.<anonymous> (C:\temp\puppeterr1\node_modules\puppeteer\lib\helper.js:144:27)
at Page.waitForNavigation (C:\temp\puppeterr1\node_modules\puppeteer\lib\Page.js:599:49)
at Page.<anonymous> (C:\temp\puppeterr1\node_modules\puppeteer\lib\helper.js:145:23)
at Object.<anonymous> (C:\temp\puppeterr1\app\server2.js:127:30)
at step (C:\temp\puppeterr1\app\server2.js:32:23)
at Object.next (C:\temp\puppeterr1\app\server2.js:13:53)
at fulfilled (C:\temp\puppeterr1\app\server2.js:4:58)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
(node:17200) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:17200) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
问题:
我在这里错过了什么?
为什么它不能到达 ██BBB?
更新:
https://github.com/GoogleChrome/puppeteer/issues/3535
原答案:
更新,我已经设法解决了它,但不是通过我想要的常规方式。
ElementHandle
好像有问题。这就是我转向纯 DOM 对象的原因。
我仍然对更直观的解决方案感兴趣,而不是处理 ElementHandle :
无论如何这是我的解决方案:
(async () =>
{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
let url = "https://www.mutualart.com/Artists";
console.log(`Fetching page data for : ${url}...`);
await page.goto(url);
await page.waitForSelector(".item.col-xs-3");
let arrMainLinks = await page.evaluate(() =>
{
return Array.from(document.querySelectorAll('.item.col-xs-3 > a'));
});
console.log(arrMainLinks.length);
for (let i = 0; i < arrMainLinks.length; i++) //get the main links
{
await page.evaluate((a) =>
{
return ([...document.querySelectorAll('.item.col-xs-3 > a')][a] as HTMLElement ).click();
}, i);
await page.waitForNavigation();
let arrSubLinks2 = await page.evaluate(() =>
{
return Array.from(document.querySelectorAll('.slide>a'));
});
console.log(arrSubLinks2.length);
for (let j = 0; j < arrSubLinks2.length; j++)
{
console.log('███AAA');
await page.evaluate((a) =>
{
return ([...document.querySelectorAll('.slide>a')][a] as HTMLElement) .click();
}, j);
await page.waitForNavigation();
let ddd: ElementHandle[] = await page.$$('.artist-name');
console.log(ddd.length);
console.log('███BBB');
await page.waitFor(2000);
await page.goBack();
console.log('███CCC');
}
await page.waitFor(2000);
await page.goBack();
}
await browser.close();
})();