使用puppeteer获取数据的时间问题
problems of time in obtaining data using puppeteer
问题
开发者您好,
我一直在使用 puppeteer 抓取特定页面,尤其是视频部分。我遇到的问题是拍摄视频src的时间大于10s
有没有办法减少等待时间?
代码
如果您注意到我已尝试执行请求,请不要参考字体、样式表和图像,以使其更快。
但是还是等了超过10s
const getAnimeVideo = async (id: string, chapter: number) => {
const BASE_URL = `${url}${id}/${chapter}/`;
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36');
await page.setRequestInterception(true);
page.on('request', (req) => {
if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
req.abort();
}
else{
req.continue();
}
});
await page.goto(BASE_URL);
await page.waitFor(10000);
const elementHandle = await page.waitForSelector('iframe.player_conte');
const frame = await elementHandle.contentFrame();
const video = await frame.$eval('#jkvideo_html5_api', el =>
Array.from(el.getElementsByTagName('source')).map(e => e.getAttribute("src")));
await page.close();
await browser.close();
return video;
}
使用 cheerio 的解决方案
async function getVideoURL(url: string) {
// This requests the underlying iframe page
const { data } = await axios.get(url);
const $ = cheerio.load(data);
const video = $('video');
if (video.length) {
// Sometimes the video is directly embedded
const src = $(video).find('source').attr('src');
return src;
} else {
// If the video is not embedded, there is obfuscated code that will create a video element
// Here we run the code to get the underlying video url
const scripts = $('script');
// The obfuscated code uses a variable called l which is the window / global object
const l = global;
// The obfuscated code uses a variable called ll which is String
const ll = String;
const $script2 = $(scripts[1]).html();
// Kind of dangerous, but the code is very obfuscated so its hard to tell how it decrypts the URL
eval($script2);
// The code above sets a variable called ss that is the mp4 URL
return (l as any).ss;
}
}
问题
开发者您好,
我一直在使用 puppeteer 抓取特定页面,尤其是视频部分。我遇到的问题是拍摄视频src的时间大于10s
有没有办法减少等待时间?
代码
如果您注意到我已尝试执行请求,请不要参考字体、样式表和图像,以使其更快。
但是还是等了超过10s
const getAnimeVideo = async (id: string, chapter: number) => {
const BASE_URL = `${url}${id}/${chapter}/`;
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36');
await page.setRequestInterception(true);
page.on('request', (req) => {
if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
req.abort();
}
else{
req.continue();
}
});
await page.goto(BASE_URL);
await page.waitFor(10000);
const elementHandle = await page.waitForSelector('iframe.player_conte');
const frame = await elementHandle.contentFrame();
const video = await frame.$eval('#jkvideo_html5_api', el =>
Array.from(el.getElementsByTagName('source')).map(e => e.getAttribute("src")));
await page.close();
await browser.close();
return video;
}
使用 cheerio 的解决方案
async function getVideoURL(url: string) {
// This requests the underlying iframe page
const { data } = await axios.get(url);
const $ = cheerio.load(data);
const video = $('video');
if (video.length) {
// Sometimes the video is directly embedded
const src = $(video).find('source').attr('src');
return src;
} else {
// If the video is not embedded, there is obfuscated code that will create a video element
// Here we run the code to get the underlying video url
const scripts = $('script');
// The obfuscated code uses a variable called l which is the window / global object
const l = global;
// The obfuscated code uses a variable called ll which is String
const ll = String;
const $script2 = $(scripts[1]).html();
// Kind of dangerous, but the code is very obfuscated so its hard to tell how it decrypts the URL
eval($script2);
// The code above sets a variable called ss that is the mp4 URL
return (l as any).ss;
}
}