网页无限滚动时如何让Apify Crawler滚动整页？

Question

我遇到了一个问题，我无法在产品目录页面上使用延迟加载来获取网站的所有产品数据。这意味着它需要滚动直到加载整个页面。

我只获取首页产品数据。

Answer 1

首先，您应该记住，无限滚动的实现方式有无数种。有时您必须在途中单击按钮或进行任何类型的转换。我将在这里只介绍最简单的用例，即以一定间隔向下滚动并在没有加载新产品时完成。

如果您使用 Apify SDK, you can use infiniteScroll helper utility function. If it doesn't cover your use-case, ideally please give us feedback on Github 构建自己的 actor。
如果您使用的是通用的 Scrapers (Web Scraper or Puppeteer Scraper)，无限滚动功能目前还没有内置（但如果您将来阅读这篇文章，也许可以）。另一方面，自己实现并没有那么复杂，让我向您展示一个针对Web Scraper pageFunction.

async function pageFunction(context) {
    // few utilities
    const { request, log, jQuery } = context;
    const $ = jQuery;

    // Here we define the infinite scroll function, it has to be defined inside pageFunction
    const infiniteScroll = async (maxTime) => {
        const startedAt = Date.now();
        let itemCount = $('.my-class').length; // Update the selector
        while (true) {
            log.info(`INFINITE SCROLL --- ${itemCount} items loaded --- ${request.url}`)
            // timeout to prevent infinite loop
            if (Date.now() - startedAt > maxTime) {
                return;
            }
            scrollBy(0, 9999);
            await context.waitFor(5000); // This can be any number that works for your website
            const currentItemCount = $('.my-class').length; // Update the selector

            // We check if the number of items changed after the scroll, if not we finish
            if (itemCount === currentItemCount) {
                return;
            }
            itemCount = currentItemCount;
        }
    }

    // Generally, you want to do the scrolling only on the category type page
    if (request.userData.label === 'CATEGORY') {
        await infiniteScroll(60000); // Let's try 60 seconds max

        // ... Add your logic for categories
    } else {
        // Any logic for other types of pages
    }
}

当然，这是一个非常简单的例子。有时它会变得更加复杂。我什至曾经使用 Puppeteer 直接导航我的鼠标并拖动一些可以通过编程访问的滚动条。

网页无限滚动时如何让Apify Crawler滚动整页？

How to make the Apify Crawler to scroll full page when web page have infinite scrolling?

javascript

web-crawler

infinite-scroll

apify