How to get around Error: net::ERR_CONNECTION in Puppeteer

How to get around Error: net::ERR_CONNECTION in Puppeteer

我尝试从这个站点获取代理:https://hidemy.name/en/proxy-list/?type=4#list

这是我的 Puppeteer 抓取代码(部署到 Heroku),它在 .goto() 行的标题中返回错误:

const preparePageForTests = async (page) => {

const userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36';

  await page.setUserAgent(userAgent);

  await page.evaluateOnNewDocument(() => {
    Object.defineProperty(navigator, 'webdriver', {
      get: () => false,
    });
  });

  // Pass the Chrome Test.
  await page.evaluateOnNewDocument(() => {
    // We can mock this in as much depth as we need for the test.
    window.navigator.chrome = {
      app: {
        isInstalled: false,
      },
      webstore: {
        onInstallStageChanged: {},
        onDownloadProgress: {},
      },
      runtime: {
        PlatformOs: {
          MAC: 'mac',
          WIN: 'win',
          ANDROID: 'android',
          CROS: 'cros',
          LINUX: 'linux',
          OPENBSD: 'openbsd',
        },
        PlatformArch: {
          ARM: 'arm',
          X86_32: 'x86-32',
          X86_64: 'x86-64',
        },
        PlatformNaclArch: {
          ARM: 'arm',
          X86_32: 'x86-32',
          X86_64: 'x86-64',
        },
        RequestUpdateCheckStatus: {
          THROTTLED: 'throttled',
          NO_UPDATE: 'no_update',
          UPDATE_AVAILABLE: 'update_available',
        },
        OnInstalledReason: {
          INSTALL: 'install',
          UPDATE: 'update',
          CHROME_UPDATE: 'chrome_update',
          SHARED_MODULE_UPDATE: 'shared_module_update',
        },
        OnRestartRequiredReason: {
          APP_UPDATE: 'app_update',
          OS_UPDATE: 'os_update',
          PERIODIC: 'periodic',
        },
      }
    };
  });

  await page.evaluateOnNewDocument(() => {
    const originalQuery = window.navigator.permissions.query;
    return window.navigator.permissions.query = (parameters) => (
      parameters.name === 'notifications' ?
        Promise.resolve({ state: Notification.permission }) :
        originalQuery(parameters)
    );
  });

  await page.evaluateOnNewDocument(() => {
    // Overwrite the `plugins` property to use a custom getter.
    Object.defineProperty(navigator, 'plugins', {
      // This just needs to have `length > 0` for the current test,
      // but we could mock the plugins too if necessary.
      get: () => [1, 2, 3, 4, 5],
    });
  });

  await page.evaluateOnNewDocument(() => {
    // Overwrite the `plugins` property to use a custom getter.
    Object.defineProperty(navigator, 'languages', {
      get: () => ['en-US', 'en'],
    });
  });
}

const browser = await puppeteerExtra.launch({ headless: true, args: [                
'--no-sandbox', '--disable-setuid-sandbox', '--proxy-server=socks4://109.94.182.128:4145']});

const page = await browser.newPage();

await preparePageForTests(page);

await page.goto('https://www.google.com/search?q=concerts+near+new+york&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail#htivrt=events&htidocid=L2F1dGhvcml0eS9ob3Jpem9uL2NsdXN0ZXJlZF9ldmVudC8yMDIxLTA2LTA0fDIxMjMzMzg4NTU2Nzc1NDk%3D&fpstate=tldetail') 

我有时也会得到“ERR_CONNECTION_CLOSED”或“ERR_CONNECTION_FAILED”而不是 ERR_CONNECTION_RESET。

任何有助于消除此错误的帮助(大概是通过在 preparePageForTests 函数中添加更多方法来通过 google 测试)都会很棒,谢谢!

您需要await page.goto("...")

await page.goto("https://google.com", {waitUntil: "networkidle2"});

您使用的是低质量 public 代理,它们自然会产生网络错误 and/or 被 Google 阻止。这里最简单的解决方案是付费。

但如果page.open失败,也可以拦截错误并重复请求:

const collectData = async (page) => {
  try {
    await page.goto('https://www.google.com/search?q=concerts+near+new+york');
    return page.evaluate(() => document.title);
  } catch (err) {
    console.error(err.message);
    return false;
  }
}

let data = false;
let attempts = 0;

// Retry request until it gets data or tries 5 times
while(data === false && attempts < 5)
{
  data = await collectData(page);
  attempts += 1;  
  if (data === false) {
    // Wait a few seconds, also a good idea to swap proxy here*
    await new Promise((resolve) => setTimeout(resolve, 3000));
  }
}


* 以编程方式更改代理的模块: