Puppeteer 和通过多个用户循环一个过程
Puppeteer & cycling a process through multiple users
我正在尝试从两个用户的登录墙后面的网页中抓取信息。就目前而言,我已经设法让代码为第一个用户做我想做的事情,即转到网页,登录,收集与保存列表中的属性相关的链接,使用该列表收集更多详细信息并将它们记录到控制台。
我现在面临的挑战是让代码在第二个用户循环这一轮而不必重复代码。你建议我怎么做?
其次,我需要为每个用户创建数组,在下面声明为 uniquePropertyLinks
,可以在函数 userProcess 之外访问。
如何为每个用户生成一个新数组?
如何在函数外访问数组?
代码如下:
const puppeteer = require('puppeteer');
//Code to locate text and enable it to be clicked
const escapeXpathString = str => {
const splitedQuotes = str.replace(/'/g, `', "'", '`);
return `concat('${splitedQuotes}', '')`;
};
const clickByText = async (page, text) => {
const escapedText = escapeXpathString(text);
const linkHandlers = await page.$x(`//a[contains(text(), ${escapedText})]`);
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error(`Link not found: ${text}`);
}
};
//User credentials
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
//Logout
const LogOut = async (page) => {
await page.goto('https://www.website.com');
await clickByText(page, 'Log out');
await page.waitForNavigation({waitUntil: 'load'});
console.log('Signed out');
};
///////////////////////////
//SCRAPE PROCESS
async function userProcess() {
try {
const browser = await puppeteer.launch({ headless : false });
const page = await browser.newPage();
page.setUserAgent('BLAHBLAHBLAH');
//Go to Website saved list
await page.goto('https://www.website.com/shortlist.html', {waitUntil: 'networkidle2'});
console.log('Page loaded');
//User A log in
await page.type('input[name=email]', userAEmail, {delay: 10});
await page.type('input[name=password]', userAPassword, {delay: 10});
await page.click('.mrm-button',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'})
console.log('Signed in');
//Wait for website saved list to load
const propertyList = await page.$$('.title');
console.log(propertyList.length);
//Collecting links from saved list and de-duping into an array
const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
let uniquePropertyLinks = [...new Set(propertyLinks)];
console.log(uniquePropertyLinks);
//Sign out
LogOut(page);
} catch (err) {
console.log('Our error - ', err.message);
}
};
userProcess();
让我们看看完成任务可能需要的一些东西。我认为最好花时间自己培养技能,但我或许可以指出一些关键的事情。
您使用:
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
但是你说的是循环。这样的数据结构,循环这两个用户会很困难。我建议将它放入一个对象中:
const users = {
a: {
email: 'abc@hotmail.com',
password: '123',
},
b: {
email: 'def@hotmail.com',
password: '456',
},
};
然后你可以很容易地看一下 for .. in
:
for (const user in users) {
console.log(users[user]);
}
或 .forEach()
:
Object.values(users).forEach(user => {
console.log(user);
});
need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.
然后在函数外声明数组:
let uniquePropertyLinks = [];
async function userProcess() {
// you can access uniquePropertyLinks here
}
// and you can access uniquePropertyLinks here as well
How can I produce a new array for each user? How can I access the array outside the function?
同样,最好选择不同的数据结构,让我们假设一个对象,其键代表每个用户,值将是数组。看起来像这样:
let uniquePropertyLinks = {};
uniquePropertyLinks.a = [];
uniquePropertyLinks.b = [];
看起来像这样:
{ a: [], b: [] }
因此您可以将用户 a 的任何值保存到 uniquePropertyLinks.a
数组中,并将您需要的任何值保存到 uniquePropertyLinks.b
数组中:
uniquePropertyLinks.a.push('new_value_for_a_user');
用户 b 同样如此。
现在您应该拥有返回代码并进行必要更改所需的所有位。
对于那些寻找 pavelsaman 的建议结果的人,下面是更新后的代码:
const puppeteer = require('puppeteer');
//Object storing user credentials
let userAEmail = 'abc';
let userAPassword = '123';
let userBEmail = 'def';
let userBPassword = '456';
const users = {
userA: {
email: userAEmail,
password: userAPassword,
},
userB: {
email: userBEmail,
password: userBPassword,
},
};
//Object storing users saved lists as arrays
const usersPropertyLinks = {};
usersPropertyLinks.userA = [];
usersPropertyLinks.userB = [];
//Function to retrieve users saved list of properties
async function retrieveUserSavedList(users, usersPropertyLinks) {
try {
//Load broswer
const browser = await puppeteer.launch({ headless : true });
const page = await browser.newPage();
page.setUserAgent('BLAHHBLAHHBLAHH');
for (const user in users) {
//Go to saved list
await page.goto('https://www.website.co.uk/user/shortlist.html', {waitUntil: 'networkidle2'});
await page.waitForSelector('.mrm-button');
//User log in
await page.type('input[name=email]', users[user].email, {delay: 10});
await page.type('input[name=password]', users[user].password, {delay: 10});
await page.click('.mrm-button',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'})
console.log('Success: ' + users[user].email + ' logged in');
//Collecting saved property links and de-duping into an array
const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
//Add saved property links to an array for each user
if (users[user].email === userAEmail ) {
usersPropertyLinks.userA.push(...new Set(propertyLinks));
} else if (users[user].email === userBEmail ) {
usersPropertyLinks.userB.push(...new Set(propertyLinks));
} else {
console.log('problem saving links to user array');
};
//Sign out
await page.click('.sc-kAzzGY',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'});
console.log('Success: ' + users[user].email + ' logged out');
};
browser.close();
} catch (err) {
console.log('Error retrieve user saved list - ', err.message);
}
};
//Run the code
retrieveUserSavedList(users, usersPropertyLinks);
我正在尝试从两个用户的登录墙后面的网页中抓取信息。就目前而言,我已经设法让代码为第一个用户做我想做的事情,即转到网页,登录,收集与保存列表中的属性相关的链接,使用该列表收集更多详细信息并将它们记录到控制台。
我现在面临的挑战是让代码在第二个用户循环这一轮而不必重复代码。你建议我怎么做?
其次,我需要为每个用户创建数组,在下面声明为 uniquePropertyLinks
,可以在函数 userProcess 之外访问。
如何为每个用户生成一个新数组?
如何在函数外访问数组?
代码如下:
const puppeteer = require('puppeteer');
//Code to locate text and enable it to be clicked
const escapeXpathString = str => {
const splitedQuotes = str.replace(/'/g, `', "'", '`);
return `concat('${splitedQuotes}', '')`;
};
const clickByText = async (page, text) => {
const escapedText = escapeXpathString(text);
const linkHandlers = await page.$x(`//a[contains(text(), ${escapedText})]`);
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error(`Link not found: ${text}`);
}
};
//User credentials
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
//Logout
const LogOut = async (page) => {
await page.goto('https://www.website.com');
await clickByText(page, 'Log out');
await page.waitForNavigation({waitUntil: 'load'});
console.log('Signed out');
};
///////////////////////////
//SCRAPE PROCESS
async function userProcess() {
try {
const browser = await puppeteer.launch({ headless : false });
const page = await browser.newPage();
page.setUserAgent('BLAHBLAHBLAH');
//Go to Website saved list
await page.goto('https://www.website.com/shortlist.html', {waitUntil: 'networkidle2'});
console.log('Page loaded');
//User A log in
await page.type('input[name=email]', userAEmail, {delay: 10});
await page.type('input[name=password]', userAPassword, {delay: 10});
await page.click('.mrm-button',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'})
console.log('Signed in');
//Wait for website saved list to load
const propertyList = await page.$$('.title');
console.log(propertyList.length);
//Collecting links from saved list and de-duping into an array
const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
let uniquePropertyLinks = [...new Set(propertyLinks)];
console.log(uniquePropertyLinks);
//Sign out
LogOut(page);
} catch (err) {
console.log('Our error - ', err.message);
}
};
userProcess();
让我们看看完成任务可能需要的一些东西。我认为最好花时间自己培养技能,但我或许可以指出一些关键的事情。
您使用:
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
但是你说的是循环。这样的数据结构,循环这两个用户会很困难。我建议将它放入一个对象中:
const users = {
a: {
email: 'abc@hotmail.com',
password: '123',
},
b: {
email: 'def@hotmail.com',
password: '456',
},
};
然后你可以很容易地看一下 for .. in
:
for (const user in users) {
console.log(users[user]);
}
或 .forEach()
:
Object.values(users).forEach(user => {
console.log(user);
});
need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.
然后在函数外声明数组:
let uniquePropertyLinks = [];
async function userProcess() {
// you can access uniquePropertyLinks here
}
// and you can access uniquePropertyLinks here as well
How can I produce a new array for each user? How can I access the array outside the function?
同样,最好选择不同的数据结构,让我们假设一个对象,其键代表每个用户,值将是数组。看起来像这样:
let uniquePropertyLinks = {};
uniquePropertyLinks.a = [];
uniquePropertyLinks.b = [];
看起来像这样:
{ a: [], b: [] }
因此您可以将用户 a 的任何值保存到 uniquePropertyLinks.a
数组中,并将您需要的任何值保存到 uniquePropertyLinks.b
数组中:
uniquePropertyLinks.a.push('new_value_for_a_user');
用户 b 同样如此。
现在您应该拥有返回代码并进行必要更改所需的所有位。
对于那些寻找 pavelsaman 的建议结果的人,下面是更新后的代码:
const puppeteer = require('puppeteer');
//Object storing user credentials
let userAEmail = 'abc';
let userAPassword = '123';
let userBEmail = 'def';
let userBPassword = '456';
const users = {
userA: {
email: userAEmail,
password: userAPassword,
},
userB: {
email: userBEmail,
password: userBPassword,
},
};
//Object storing users saved lists as arrays
const usersPropertyLinks = {};
usersPropertyLinks.userA = [];
usersPropertyLinks.userB = [];
//Function to retrieve users saved list of properties
async function retrieveUserSavedList(users, usersPropertyLinks) {
try {
//Load broswer
const browser = await puppeteer.launch({ headless : true });
const page = await browser.newPage();
page.setUserAgent('BLAHHBLAHHBLAHH');
for (const user in users) {
//Go to saved list
await page.goto('https://www.website.co.uk/user/shortlist.html', {waitUntil: 'networkidle2'});
await page.waitForSelector('.mrm-button');
//User log in
await page.type('input[name=email]', users[user].email, {delay: 10});
await page.type('input[name=password]', users[user].password, {delay: 10});
await page.click('.mrm-button',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'})
console.log('Success: ' + users[user].email + ' logged in');
//Collecting saved property links and de-duping into an array
const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
//Add saved property links to an array for each user
if (users[user].email === userAEmail ) {
usersPropertyLinks.userA.push(...new Set(propertyLinks));
} else if (users[user].email === userBEmail ) {
usersPropertyLinks.userB.push(...new Set(propertyLinks));
} else {
console.log('problem saving links to user array');
};
//Sign out
await page.click('.sc-kAzzGY',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'});
console.log('Success: ' + users[user].email + ' logged out');
};
browser.close();
} catch (err) {
console.log('Error retrieve user saved list - ', err.message);
}
};
//Run the code
retrieveUserSavedList(users, usersPropertyLinks);