我如何修复这个使用 puppeteer 制作的网络爬虫,它在抓取一半数据后什么都不做但没有给出任何错误?

How do I fix this webscraper made using puppeteer which is doing nothing after scraping half data but not giving any error?

对于我的大学项目,我使用 nodejs 和 puppeteer 制作了一个维基百科抓取工具。它适用于除一个 link 以外的所有人。在该页面中抓取了 table 的几乎一半数据后(我正在使用 console.log 查看当时抓取了哪些数据)它什么也没做。它没有显示任何错误。它不会停止执行,之后什么都不做。 puppeteer浏览器也不关闭

在原来的爬虫中,我使用了一个links的循环来生成数据。由于它不起作用,所以我为 link 制作了一个单独的刮刀,但同样的事情正在发生。谁能帮帮我?

const puppeteer = require('puppeteer');
const fs = require('fs');


(async () => {

try {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });

    link = "https://en.wikipedia.org/wiki/List_of_terrorist_incidents_in_June_2016";

    console.log("==============================");
    console.log("Travelling to link:", link);
    console.log("==============================");

    await page.goto(link, {waitUntil: 'networkidle0'});

    let rowArray = await page.$$("table[class='wikitable sortable jquery-tablesorter'] > tbody > tr");

    var dataA = [];

    for(let row of rowArray){
        let date = await row.$eval('td:nth-child(1)', element => element.textContent);
        date = date.substring(0, date.length - 1);
        let type = await row.$eval('td:nth-child(2)', element => element.textContent);
        type = type.substring(0, type.length - 1);
        let dead = await row.$eval('td:nth-child(3)', element => element.textContent);
        dead = dead.substring(0, dead.length - 1);
        let injured = await row.$eval('td:nth-child(4)', element => element.textContent);
        injured = injured.substring(0, injured.length - 1);
        let location = await row.$eval('td:nth-child(5)', element => element.textContent);
        location = location.substring(0, location.length - 1);
        let details = await row.$eval('td:nth-child(6)', element => element.textContent);
        details = details.substring(0, details.length - 1);
        let perpetrator = await row.$eval('td:nth-child(7)', element => element.textContent);
        perpetrator = perpetrator.substring(0, perpetrator.length - 1);
        let partOf = await row.$eval('td:nth-child(8)', element => element.textContent);
        partOf = partOf.substring(0, partOf.length - 1);
        


        console.log("==============================");
        console.log({date, type, dead, injured, location, details, perpetrator, partOf});
        console.log("==============================");

        dataA.push({date, type, dead, injured, location, details, perpetrator, partOf});
    }

    console.log("==============================");
    console.log("Started writing JSON file");
    fs.writeFileSync(`./june.json`, JSON.stringify(dataA), 'utf-8');
    console.log("Finished writing JSON file");
    console.log("==============================");


    await browser.close();

} catch (error) {
    console.error();
}

})();

只要看看它停止的地方

似乎脚本无法处理没有 "closing cell"

的下一行

我的猜测是,如果您编辑该页面并关闭它,它将起作用(或更新您的脚本以处理该情况)

查看维基百科源代码,在那一行中缺少 "part of" 单元格,因此您的代码只是挂在 'await' 部分

    let partOf = await row.$eval('td:nth-child(8)', element => element.textContent);

这样你就不会出错。