Puppeteer:打开一个页面,获取数据,返回上一页,进入新页面获取数据

Puppeteer: Open a page, get the data, go back to the previous page, enter a new page to get data

从一页获取数据很简单,但是从第一页获取数据后如何返回,进入新页面,从该页面获取数据等。我正在尝试在网站上执行此操作 http://books.toscrape.com/.

因此,我选择打印库存中的书籍数量,因为只有输入 link 才能访问它。例如,如果您 运行 您将获得代码:{ stock: 'In stock (22 available)' }

现在,我想回到原来的页面,输入第二个link,取与上一个相同的信息。等等..

如何使用 vanilla JavaScript 完成此操作?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape.com/');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

说明

您需要做的是调用page.goBack() to go back one page when your task is finished and then click the next element. For this you should use page.$$ 获取可点击元素的列表,并使用循环逐个跳过它们。然后您可以重新运行您的脚本来为下一页提取相同的信息。

代码

我调整了您的代码以在控制台中为下面的每一页打印出您想要的结果。请注意,我更改了 selector 从您的问题中删除了 :nth-child(1) 到 select 所有可点击的元素。

const puppeteer = require('puppeteer');

const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape.com/');

    // get all elements to be clicked
    let elementsToClick = await page.$$(elementsToClickSelector);
    console.log(`Elements to click: ${elementsToClick.length}`);

    for (let i = 0; i < elementsToClick.length; i++) {
        // click element
        elementsToClick[i].click();
        await page.waitFor(1000);

        // generate result for the current page
        const result = await page.evaluate(() => {
            let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
            return { stock };
        });
        console.log(result); // do something with the result here...

        // go back one page and repopulate the elements
        await page.goBack();
        elementsToClick = await page.$$(elementsToClickSelector);
    }

    browser.close();
};

scrape();