在还剩 none 之前,不能让我的脚本继续点击

Can't let my script keep on clicking until there is none left

我在 node 中结合 puppeteer 编写了一个脚本,用于在 website 中抓取遍历多个页面的不同机构的名称。

我的以下脚本可以从登陆页面解析机构名称,然后在从其他页面解析名称时启动几次点击,最后在执行过程中的某个时刻遇到错误。

the error:  TypeError: Cannot read property 'click' of undefined
    at main (c:\Users\WCS\Desktop\Node vault\comments.js:18:25)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:118:7)

我使用了硬编码 for loop,因为我真的不知道让脚本继续单击下一页按钮直到剩下 none。我希望遵守任何逻辑,以便我的脚本首先查找下一页按钮。如果找到一个,它将单击该按钮并重复该过程。

我试过:

const puppeteer = require('puppeteer');
const link = "https://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx";

(async function main() {
  try {
    const browser = await puppeteer.launch({headless:false});
    const [page]    = await browser.pages();
    await page.goto(link);
    await page.waitForSelector("h1.faqsno-heading");

    for(let i = 1; i < 20; i++){
      const sections = await page.$$("h1.faqsno-heading");
      for (const section of sections) {
          const itemName = await section.$eval("div[id^='arrowex']", el => el.innerText);
          console.log(itemName);
      }
      const nextPage = await page.$$(".ms-paging > a");
      await nextPage[i].click();
      await page.waitForNavigation({waituntil:'networkidle0'});
    }

    await browser.close();
  } catch (e) {
    console.log('the error: ', e);
  }
})();

Btw, to save this post from duplicity I must acknowledge that I've come across but I don't think I myself can implement the logic within my script.

您尝试过简单的 if 条件吗?

const nextPage = await page.$$(".ms-paging > a");

if(nextPage && nextPage[i]){
  await nextPage[i].click();
  await page.waitForNavigation({waituntil:'networkidle0'});
}

这样只有有按钮才会点击

  • 解决方案-简单的方法

替换此代码

      const nextPage = await page.$$(".ms-paging > a");
      await nextPage[i].click();
      await page.waitForNavigation({waituntil:'networkidle0'}); 

有了这个

      await page.click("[title='Next Page']")
      await page.waitForNavigation({waituntil:'networkidle0'})
  • 解决方案 - 您的方式(愚蠢的数学!)。当您继续点击时重新调整索引,因为您的页面索引不断变化,但它始终为 0-5。
const puppeteer = require('puppeteer');
const link = "https://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx";

(async function main() {
  try {
    const browser = await puppeteer.launch({headless:false});
    const [page]    = await browser.pages();
    await page.goto(link);
    await page.waitForSelector("h1.faqsno-heading");
     let j=0;
     let NoOfPage=9  // adjust here to get number of pages
    for(let i = 0; j<NoOfPage+1; i++,j++){
        if (j>4) {
            i=4;
        }
      if (i>0) {
      await page.waitForSelector("h1.faqsno-heading",{visible:true});
      const sections = await page.$$("h1.faqsno-heading");

       for (const section of sections) {
          const itemName = await section.$eval("div[id^='arrowex']", el => el.innerText);
          console.log(itemName);

      }

      }

      const nextPage= await page.$$(".ms-paging > a");
      await Promise.all([
      await nextPage[i].click(),
      await page.waitForNavigation({waituntil:'networkidle0'}),
])


    }

    await browser.close();
  } catch (e) {
    console.log('the error: ', e);
  }
})();
  • 第 19 页左右的一些输出
C:\NodeJS\PuppeteerTest\Pup>node stack56652523.js
....
....
HAPPY PUBLIC SCHOOL SAMITI
AABAH3894H
SAGRADA FAMILIA SOCIETY, SOUTH GOA
AAWAS5165K
K V DEVADIGA CHARITABLE TRUST, DAKSHINA KANNADA
AADTK1517B
SHRINE OF INFANT JESUS, CHICKMAGLUR
AAVTS1925P
SRI NANDI VEDACURU CHARITABLE, TRUST
AATTS1842D
SHREE SUBRAHMANYA VANGMAYEE PARISHAD, GOA
AAPTS2410M
SHREE SUBRAHMANYA VANGMAYEE PARISHAD, GOA
AAPTS2410M
WORD FOR THE WORLD FELLOWSHIP
AAAAW6295Q
JANA SEVA TRUST
AACTJ0594Q
VAGDEVI VILAS EDUCATIONAL AND CHARITABLE TRUST
AABTV8264G