当通过for循环调用的所有异步函数完成时如何做某事?
How to do smoething when all the async functions completed which are called through for loop?
所以我正在使用 puppeteer
,我只是 在多个选项卡中并行抓取 页面并打开多个选项卡 URL 我是像这样使用 for 循环:
const startScraping = async (url) => {
for (let i of MyArray) {
const page = await browser.newPage();
page.goto(url).then(() => {
scrapePage(page); // This is the function where I am scraping through this page. and
// This is also a async function
});
}
return new Promise((resolve, reject) => {
resolve("Done");
reject("Error");
});
}
startScraping(url).then((data) => {
console.log(data);
})
但问题是在循环后立即返回 promise 但我想要的是这个 promise
应该在所有页面都被 抓取后返回 .
有人可以帮我吗?
PS: scrapePage() is also a async function
提前致谢。
仅用于解释场景:
async function func() {
setTimeout(() => {
return "Done";
}, 3000);
}
async function scrapeSingle(url) {
return [url, await func()];
}
let myArray = [1, 2, 3, 4, 5];
const parallelScrapes = myArray.map((url) => scrapeSingle(url));
Promise.all(parallelScrapes).then((data) => {
console.log(data);
});
这里我想在 3 秒后打印 [[1, "Done"], [2, "Done"], [3, "Done"], [4, "Done"], [5, "Done"]]
但它立即打印 [[ 1, undefined ], [ 2, undefined ], [ 3, undefined ], [ 4, undefined ], [ 5, undefined ]]
。
您正在混合搭配 async
和 then
甚至 new Promise()
。
串行解决方案是
const startScraping = async (url) => {
const data = [];
for (let i of MyArray) {
const page = await browser.newPage();
await page.goto(url);
const result = await scrapePage(page);
data.push([i, result]);
}
return data;
};
startScraping(url).then((data) => {
console.log(data);
});
要并行处理 myArray
中的所有网址,您需要使用 Promise.all()
:
async function scrapeSingle(browser, url) {
const page = await browser.newPage();
await page.goto(url);
return [url, await scrapePage(page)];
}
const parallelScrapes = myArray.map((url) =>
scrapeSingle(browser, url),
);
Promise.all(parallelScrapes).then((data) => {
console.log(data);
});
这有效。
const startScraping = async (url) => {
let tasks = [];
for (let i of MyArray) {
const page = await browser.newPage();
await page.goto(url);
tasks.push(scrapePage(page))
}
await Promise.all(tasks);
return new Promise((resolve, reject) => {
resolve("Done");
reject("Error");
});
}
所以我正在使用 puppeteer
,我只是 在多个选项卡中并行抓取 页面并打开多个选项卡 URL 我是像这样使用 for 循环:
const startScraping = async (url) => {
for (let i of MyArray) {
const page = await browser.newPage();
page.goto(url).then(() => {
scrapePage(page); // This is the function where I am scraping through this page. and
// This is also a async function
});
}
return new Promise((resolve, reject) => {
resolve("Done");
reject("Error");
});
}
startScraping(url).then((data) => {
console.log(data);
})
但问题是在循环后立即返回 promise 但我想要的是这个 promise
应该在所有页面都被 抓取后返回 .
有人可以帮我吗?
PS: scrapePage() is also a async function
提前致谢。
仅用于解释场景:
async function func() {
setTimeout(() => {
return "Done";
}, 3000);
}
async function scrapeSingle(url) {
return [url, await func()];
}
let myArray = [1, 2, 3, 4, 5];
const parallelScrapes = myArray.map((url) => scrapeSingle(url));
Promise.all(parallelScrapes).then((data) => {
console.log(data);
});
这里我想在 3 秒后打印 [[1, "Done"], [2, "Done"], [3, "Done"], [4, "Done"], [5, "Done"]]
但它立即打印 [[ 1, undefined ], [ 2, undefined ], [ 3, undefined ], [ 4, undefined ], [ 5, undefined ]]
。
您正在混合搭配 async
和 then
甚至 new Promise()
。
串行解决方案是
const startScraping = async (url) => {
const data = [];
for (let i of MyArray) {
const page = await browser.newPage();
await page.goto(url);
const result = await scrapePage(page);
data.push([i, result]);
}
return data;
};
startScraping(url).then((data) => {
console.log(data);
});
要并行处理 myArray
中的所有网址,您需要使用 Promise.all()
:
async function scrapeSingle(browser, url) {
const page = await browser.newPage();
await page.goto(url);
return [url, await scrapePage(page)];
}
const parallelScrapes = myArray.map((url) =>
scrapeSingle(browser, url),
);
Promise.all(parallelScrapes).then((data) => {
console.log(data);
});
这有效。
const startScraping = async (url) => {
let tasks = [];
for (let i of MyArray) {
const page = await browser.newPage();
await page.goto(url);
tasks.push(scrapePage(page))
}
await Promise.all(tasks);
return new Promise((resolve, reject) => {
resolve("Done");
reject("Error");
});
}