Selenium - 单击页码更改页面但不会 reload/populate 数据
Selenium - clicking page number changes page but does not reload/populate data
我正在尝试收集用户评论(请参阅下面的免责声明)。评论按以下分页排列
我得到了不同编号的元素,只需单击下一步按钮 >。页面确实发生了变化,但新数据没有填充,看起来像这样
以下是代码的简短摘录:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
DRIVER_PATH = '***/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH) # depreciation, update!
URL = "https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/"
driver.get(URL)
time.sleep(5)
button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//button[@class="css-kazo96-NavButton-defaultState-activeState-focusState ehp7fkv0"]')))
button.click()
WebDriverWait(driver, 50)
# driver.close()
我该怎么做才能正确重新加载字段?我很感激我能得到的所有信息:-)
免责声明:本次为研究项目的首次测试,不会有任何未经许可的非法抓取或任何滥用数据的行为!
page/data 是动态呈现的。可以通过api获取数据,通过pages
参数迭代。也可以,调整每页的数量,在1次请求内获取(前提是评论数不超过100条)。
import requests
import pandas as pd
url = 'https://www.kbb.com/ymm/api/'
payload = {
"operationName":"consumerReviewsQuery",
"variables":{
"year":"2018",
"make":"mercedes-benz",
"model":"cla",
"page":1,
"perPage":100,
"bodystyle":"Sedan",
"sort":"1",
"filter":"",
"trendingTopic":""
},
"query":"query consumerReviewsQuery($year: String, $make: String!, $model: String!, $page: Int!, $perPage: Int!, $isInitialLoad: Boolean, $priceType: String, $bodystyle: String, $vehicleId: String, $trim: String, $sort: String, $trendingTopic: String, $filter: String) {\n consumerreviews(\n year: $year\n make: $make\n model: $model\n page: $page\n perPage: $perPage\n isInitialLoad: $isInitialLoad\n priceType: $priceType\n bodystyle: $bodystyle\n vehicleId: $vehicleId\n trim: $trim\n sort: $sort\n trendingTopic: $trendingTopic\n filter: $filter\n ) {\n numPages\n totalReviews\n reviews {\n id\n nickname\n nicknameDisplay\n location\n anonymous\n email\n sessionId\n visitorId\n sessionCount\n friendlyOwnershipStatus\n year\n model\n make\n vehicleId\n title\n reviewText\n ratingOverall\n ratingValue\n ratingReliability\n ratingPerformance\n ratingStyling\n ratingComfort\n ratingQuality\n submissionDate\n positiveLink\n negativeLink\n numPositiveFeedbacks\n numNegativeFeedbacks\n numFeedbacks\n pros\n cons\n areProsOrConsAvailable\n __typename\n }\n searchTerms\n __typename\n }\n}"}
jsonData = requests.post(url, json=payload).json()
reviews = pd.DataFrame(jsonData['data']['consumerreviews']['reviews'])
输出:
print(reviews)
id nickname ... areProsOrConsAvailable __typename
0 187159459 Love it ... True Reviews
1 179266834 Cremur ... True Reviews
2 176067479 ELSIE ... False Reviews
3 172175820 Noemia ... True Reviews
4 163968274 Pmaze ... True Reviews
5 158405420 Gary ... True Reviews
6 143025966 PMAZE ... True Reviews
7 139966209 Frenchy ... True Reviews
8 139766083 Arizona RN ... True Reviews
9 131870778 GW ... True Reviews
10 120024401 Deekay ... True Reviews
11 119822871 Tony ... True Reviews
12 116958004 MBPDX ... True Reviews
13 115487407 Smitty96 ... True Reviews
14 110965961 chhappy7 ... True Reviews
15 109184667 Tampafun ... True Reviews
16 101289834 Neile ... True Reviews
17 84350718 George ... True Reviews
18 75845132 dav ... True Reviews
19 72639833 Doug ... True Reviews
20 69174734 Carnut ... True Reviews
21 67191860 Mark ... True Reviews
22 65876085 bill ... False Reviews
23 64211472 Lazlow ... True Reviews
24 64008710 psyco ... True Reviews
25 57576670 vars0153 ... False Reviews
26 57574924 Fernando ... False Reviews
27 50932030 anauditor ... True Reviews
28 50346331 Missct1964 ... False Reviews
29 48468674 tekfoc ... True Reviews
30 48003934 BrwnJewel ... False Reviews
31 47955889 Free88 ... True Reviews
32 47726965 Josh ... True Reviews
33 47503009 Derek ... True Reviews
34 44513353 Don Z ... True Reviews
35 43143964 Raquel ... True Reviews
36 43142690 Pajama168 ... True Reviews
37 40484198 JJ ... True Reviews
38 39226477 fox4gib ... True Reviews
39 38915453 Happy in Chicago ... True Reviews
40 38485354 CLA owner ... True Reviews
41 35530044 1st time MB owner ... True Reviews
42 34931432 CC ... True Reviews
43 34151324 First time MB buyer ... True Reviews
44 33259903 tom ... True Reviews
45 32943654 Yash ... True Reviews
46 32472645 TheMarcoIslander ... True Reviews
[47 rows x 33 columns]
我正在尝试收集用户评论(请参阅下面的免责声明)。评论按以下分页排列
我得到了不同编号的元素,只需单击下一步按钮 >。页面确实发生了变化,但新数据没有填充,看起来像这样
以下是代码的简短摘录:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
DRIVER_PATH = '***/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH) # depreciation, update!
URL = "https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/"
driver.get(URL)
time.sleep(5)
button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//button[@class="css-kazo96-NavButton-defaultState-activeState-focusState ehp7fkv0"]')))
button.click()
WebDriverWait(driver, 50)
# driver.close()
我该怎么做才能正确重新加载字段?我很感激我能得到的所有信息:-)
免责声明:本次为研究项目的首次测试,不会有任何未经许可的非法抓取或任何滥用数据的行为!
page/data 是动态呈现的。可以通过api获取数据,通过pages
参数迭代。也可以,调整每页的数量,在1次请求内获取(前提是评论数不超过100条)。
import requests
import pandas as pd
url = 'https://www.kbb.com/ymm/api/'
payload = {
"operationName":"consumerReviewsQuery",
"variables":{
"year":"2018",
"make":"mercedes-benz",
"model":"cla",
"page":1,
"perPage":100,
"bodystyle":"Sedan",
"sort":"1",
"filter":"",
"trendingTopic":""
},
"query":"query consumerReviewsQuery($year: String, $make: String!, $model: String!, $page: Int!, $perPage: Int!, $isInitialLoad: Boolean, $priceType: String, $bodystyle: String, $vehicleId: String, $trim: String, $sort: String, $trendingTopic: String, $filter: String) {\n consumerreviews(\n year: $year\n make: $make\n model: $model\n page: $page\n perPage: $perPage\n isInitialLoad: $isInitialLoad\n priceType: $priceType\n bodystyle: $bodystyle\n vehicleId: $vehicleId\n trim: $trim\n sort: $sort\n trendingTopic: $trendingTopic\n filter: $filter\n ) {\n numPages\n totalReviews\n reviews {\n id\n nickname\n nicknameDisplay\n location\n anonymous\n email\n sessionId\n visitorId\n sessionCount\n friendlyOwnershipStatus\n year\n model\n make\n vehicleId\n title\n reviewText\n ratingOverall\n ratingValue\n ratingReliability\n ratingPerformance\n ratingStyling\n ratingComfort\n ratingQuality\n submissionDate\n positiveLink\n negativeLink\n numPositiveFeedbacks\n numNegativeFeedbacks\n numFeedbacks\n pros\n cons\n areProsOrConsAvailable\n __typename\n }\n searchTerms\n __typename\n }\n}"}
jsonData = requests.post(url, json=payload).json()
reviews = pd.DataFrame(jsonData['data']['consumerreviews']['reviews'])
输出:
print(reviews)
id nickname ... areProsOrConsAvailable __typename
0 187159459 Love it ... True Reviews
1 179266834 Cremur ... True Reviews
2 176067479 ELSIE ... False Reviews
3 172175820 Noemia ... True Reviews
4 163968274 Pmaze ... True Reviews
5 158405420 Gary ... True Reviews
6 143025966 PMAZE ... True Reviews
7 139966209 Frenchy ... True Reviews
8 139766083 Arizona RN ... True Reviews
9 131870778 GW ... True Reviews
10 120024401 Deekay ... True Reviews
11 119822871 Tony ... True Reviews
12 116958004 MBPDX ... True Reviews
13 115487407 Smitty96 ... True Reviews
14 110965961 chhappy7 ... True Reviews
15 109184667 Tampafun ... True Reviews
16 101289834 Neile ... True Reviews
17 84350718 George ... True Reviews
18 75845132 dav ... True Reviews
19 72639833 Doug ... True Reviews
20 69174734 Carnut ... True Reviews
21 67191860 Mark ... True Reviews
22 65876085 bill ... False Reviews
23 64211472 Lazlow ... True Reviews
24 64008710 psyco ... True Reviews
25 57576670 vars0153 ... False Reviews
26 57574924 Fernando ... False Reviews
27 50932030 anauditor ... True Reviews
28 50346331 Missct1964 ... False Reviews
29 48468674 tekfoc ... True Reviews
30 48003934 BrwnJewel ... False Reviews
31 47955889 Free88 ... True Reviews
32 47726965 Josh ... True Reviews
33 47503009 Derek ... True Reviews
34 44513353 Don Z ... True Reviews
35 43143964 Raquel ... True Reviews
36 43142690 Pajama168 ... True Reviews
37 40484198 JJ ... True Reviews
38 39226477 fox4gib ... True Reviews
39 38915453 Happy in Chicago ... True Reviews
40 38485354 CLA owner ... True Reviews
41 35530044 1st time MB owner ... True Reviews
42 34931432 CC ... True Reviews
43 34151324 First time MB buyer ... True Reviews
44 33259903 tom ... True Reviews
45 32943654 Yash ... True Reviews
46 32472645 TheMarcoIslander ... True Reviews
[47 rows x 33 columns]