Puppeteer 在 docker 容器中失败:"browser has disconnected!"
Puppeteer fails in docker container: "browser has disconnected!"
我编写了一个脚本,使用 Puppeteer v1.12.2 从网页中获取一些信息。它在我的本地机器 (Ubuntu 18.04) 上运行,节点 v10.15.1,它在我的机器上运行在 docker 容器中,node:10-slim
。
我将图像推送到 Google 云容器注册表,然后将其拉到 Google Compute Engine 机器(Ubuntu 18.04),当 Puppeteer 加载页面时它失败了:
Error: Navigation failed because browser has disconnected!
at CDPSession.LifecycleWatcher._eventListeners.helper.addEventListener (/www/node_modules/puppeteer/lib/LifecycleWatcher.js:47:107)
at CDPSession.emit (events.js:182:13)
at CDPSession._onClosed (/www/node_modules/puppeteer/lib/Connection.js:215:10)
at Connection._onClose (/www/node_modules/puppeteer/lib/Connection.js:138:15)
at WebSocketTransport._ws.addEventListener.event (/www/node_modules/puppeteer/lib/WebSocketTransport.js:45:22)
at WebSocket.onClose (/www/node_modules/ws/lib/event-target.js:124:16)
at WebSocket.emit (events.js:182:13)
at WebSocket.emitClose (/www/node_modules/ws/lib/websocket.js:180:10)
at Socket.socketOnClose (/www/node_modules/ws/lib/websocket.js:805:15)
at Socket.emit (events.js:182:13)
-- ASYNC --
at Frame.<anonymous> (/www/node_modules/puppeteer/lib/helper.js:108:27)
at Page.goto (/www/node_modules/puppeteer/lib/Page.js:662:49)
at Page.<anonymous> (/www/node_modules/puppeteer/lib/helper.js:109:23)
at scrapeLicence (/www/scrapeLicenceById.js:30:33)
at process._tickCallback (internal/process/next_tick.js:68:7)
我看过其他关于此错误的讨论 "Navigation failed because browser has disconnected!" 通过添加 await
来修复它,但我已经 await
调用了 [=37] 上的每个方法=] 对象,所以我是 运行 整个脚本程序,没有回调。它在我的本地计算机上按预期运行,但在 GCE 实例上却没有。为什么它在不同的机器上会有不同的表现?是什么导致浏览器 "disconnect"?
更新:这是有错误的脚本的最小复制:
scrape.js
const puppeteer = require('puppeteer');
const verbose = true;
async function run() {
try {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--disable-gpu',
'--window-size=1920x1080',
],
});
const pageUrl = 'https://google.com';
const page = await browser.newPage();
page.once('load', () => {
if (verbose) console.log(`Page loaded.`);
});
await page.setRequestInterception(true);
await page.setViewport({ width: 1280, height: 800 });
const response = await page.goto(pageUrl, {
timeout: 25000,
waitUntil: 'networkidle2',
});
if (response._status >= 400) {
console.error('Error from server:', response);
throw new Error('Error response from server');
}
console.log('page ok?');
await browser.close();
} catch (e) {
console.error(e);
process.exit(1);
}
}
run();
这是我用来构建映像的 Dockerfile:
FROM node:10-slim
# -------- install chrome ----------
# See https://crbug.com/795759
RUN apt-get update && apt-get install -yq libgconf-2-4
RUN apt-get update && apt-get install -y wget --no-install-recommends \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main">> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get purge --auto-remove -y curl \
&& rm -rf /src/*.deb
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init
# ------------------
# Set work directory to /www
WORKDIR /www
# Install app dependencies
COPY package.json package.json
RUN yarn install
# Copy script files
COPY . .
# Runs "/usr/bin/dumb-init -- node scrape.js"
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "scrape.js"]
这与影响 的问题相同。
基本上 puppeteer 1.12.2 安装了 chromium 的开发版本 73,它有一些小错误阻止它在特定平台上加载某些 website/script and/or 从 运行 停止它.
解决方案是使用以前的 puppeteer 版本 1.11.0,或者使用不同的稳定 chrome 版本。
这就是 executablePath: 'google-chrome'
与众不同的原因。它使用的是稳定版本而不是提供的版本。
我编写了一个脚本,使用 Puppeteer v1.12.2 从网页中获取一些信息。它在我的本地机器 (Ubuntu 18.04) 上运行,节点 v10.15.1,它在我的机器上运行在 docker 容器中,node:10-slim
。
我将图像推送到 Google 云容器注册表,然后将其拉到 Google Compute Engine 机器(Ubuntu 18.04),当 Puppeteer 加载页面时它失败了:
Error: Navigation failed because browser has disconnected!
at CDPSession.LifecycleWatcher._eventListeners.helper.addEventListener (/www/node_modules/puppeteer/lib/LifecycleWatcher.js:47:107)
at CDPSession.emit (events.js:182:13)
at CDPSession._onClosed (/www/node_modules/puppeteer/lib/Connection.js:215:10)
at Connection._onClose (/www/node_modules/puppeteer/lib/Connection.js:138:15)
at WebSocketTransport._ws.addEventListener.event (/www/node_modules/puppeteer/lib/WebSocketTransport.js:45:22)
at WebSocket.onClose (/www/node_modules/ws/lib/event-target.js:124:16)
at WebSocket.emit (events.js:182:13)
at WebSocket.emitClose (/www/node_modules/ws/lib/websocket.js:180:10)
at Socket.socketOnClose (/www/node_modules/ws/lib/websocket.js:805:15)
at Socket.emit (events.js:182:13)
-- ASYNC --
at Frame.<anonymous> (/www/node_modules/puppeteer/lib/helper.js:108:27)
at Page.goto (/www/node_modules/puppeteer/lib/Page.js:662:49)
at Page.<anonymous> (/www/node_modules/puppeteer/lib/helper.js:109:23)
at scrapeLicence (/www/scrapeLicenceById.js:30:33)
at process._tickCallback (internal/process/next_tick.js:68:7)
我看过其他关于此错误的讨论 "Navigation failed because browser has disconnected!" 通过添加 await
来修复它,但我已经 await
调用了 [=37] 上的每个方法=] 对象,所以我是 运行 整个脚本程序,没有回调。它在我的本地计算机上按预期运行,但在 GCE 实例上却没有。为什么它在不同的机器上会有不同的表现?是什么导致浏览器 "disconnect"?
更新:这是有错误的脚本的最小复制:
scrape.js
const puppeteer = require('puppeteer');
const verbose = true;
async function run() {
try {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--disable-gpu',
'--window-size=1920x1080',
],
});
const pageUrl = 'https://google.com';
const page = await browser.newPage();
page.once('load', () => {
if (verbose) console.log(`Page loaded.`);
});
await page.setRequestInterception(true);
await page.setViewport({ width: 1280, height: 800 });
const response = await page.goto(pageUrl, {
timeout: 25000,
waitUntil: 'networkidle2',
});
if (response._status >= 400) {
console.error('Error from server:', response);
throw new Error('Error response from server');
}
console.log('page ok?');
await browser.close();
} catch (e) {
console.error(e);
process.exit(1);
}
}
run();
这是我用来构建映像的 Dockerfile:
FROM node:10-slim
# -------- install chrome ----------
# See https://crbug.com/795759
RUN apt-get update && apt-get install -yq libgconf-2-4
RUN apt-get update && apt-get install -y wget --no-install-recommends \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main">> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get purge --auto-remove -y curl \
&& rm -rf /src/*.deb
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init
# ------------------
# Set work directory to /www
WORKDIR /www
# Install app dependencies
COPY package.json package.json
RUN yarn install
# Copy script files
COPY . .
# Runs "/usr/bin/dumb-init -- node scrape.js"
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "scrape.js"]
这与影响
基本上 puppeteer 1.12.2 安装了 chromium 的开发版本 73,它有一些小错误阻止它在特定平台上加载某些 website/script and/or 从 运行 停止它.
解决方案是使用以前的 puppeteer 版本 1.11.0,或者使用不同的稳定 chrome 版本。
这就是 executablePath: 'google-chrome'
与众不同的原因。它使用的是稳定版本而不是提供的版本。