selenium - 滚动网站后获取所有推文 - python
selenium - get all tweets after scrolling the website - python
我的问题实际上是两个。一种是我向下滚动直到它不再起作用,然后尝试保存所有答案。不幸的是,我只得到了一小部分较低的答案。有没有办法得到所有的答案?我在睡眠时间尝试它,但它不起作用。
我的第二个问题是,在某些页面上,页面底部会出现一个按钮,单击该按钮可获得更多答案。但是我还没有找到点击它的方法。
如果能给我小费,我将不胜感激
url = 'https://twitter.com/RegSprecher/status/1251100551183507456'
driver = webdriver.Chrome(r"path_chromedriver.exe")
driver.implicitly_wait(10)
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(1)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# If heights are the same it will exit the function
break
last_height = new_height
#Wait
time.sleep(30)
#tweet id
tweet_id = driver.find_elements_by_css_selector("a[href*='status']")
for tweet in tweet:
print(tweet.text)
for tweet_id in tweet_id:
print(tweet_id.get_attribute('href'))
driver.quit()
https://twitter.com/MarkFin79124805/status/1251129277787131904
https://twitter.com/Ehrenfrau3/status/1251272923668787200
https://twitter.com/K30107265/status/1251788504318828549
https://twitter.com/Sakasonis/status/1251102005910818817
https://twitter.com/MattCone3/status/1251117184534949888
https://twitter.com/Volksdichter/status/1251186371160682502
https://twitter.com/Volksdichter/status/1251186371160682502/photo/1
https://twitter.com/RiaIssa/status/1251817059517947910
https://twitter.com/janejane24/status/1251102104736989184
https://twitter.com/RiaIssa/status/1251102636071403522
https://twitter.com/TiBo01774121/status/1251108273241104384
https://twitter.com/RiaIssa/status/1251195169937993733
来自按钮的代码
<div class="css-1dbjc4n r-my5ep6 r-qklmqi r-1adg3ll">
<div aria-haspopup="false" role="button" data-focusable="true" tabindex="0" class="css-18t94o4 css-1dbjc4n r-1777fci r-1jayybb r-o7ynqc r-6416eg r-13qz1uu">
<div class="css-1dbjc4n r-16y2uox r-1wbh5a2 r-1777fci">
<div dir="auto" class="css-901oao r-1n1174f r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-q4m81j r-qvutc0"><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Weitere Antworten anzeigen</span></div></div></div></div>
尝试以下解决方案::
url = 'https://twitter.com/RegSprecher/status/1251100551183507456'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 20)
scrolls = 7
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(text(),'Weitere Antworten anzeigen')]"))).click()
注意:请将以下导入添加到您的解决方案
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
我的问题实际上是两个。一种是我向下滚动直到它不再起作用,然后尝试保存所有答案。不幸的是,我只得到了一小部分较低的答案。有没有办法得到所有的答案?我在睡眠时间尝试它,但它不起作用。
我的第二个问题是,在某些页面上,页面底部会出现一个按钮,单击该按钮可获得更多答案。但是我还没有找到点击它的方法。
如果能给我小费,我将不胜感激
url = 'https://twitter.com/RegSprecher/status/1251100551183507456'
driver = webdriver.Chrome(r"path_chromedriver.exe")
driver.implicitly_wait(10)
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(1)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# If heights are the same it will exit the function
break
last_height = new_height
#Wait
time.sleep(30)
#tweet id
tweet_id = driver.find_elements_by_css_selector("a[href*='status']")
for tweet in tweet:
print(tweet.text)
for tweet_id in tweet_id:
print(tweet_id.get_attribute('href'))
driver.quit()
https://twitter.com/MarkFin79124805/status/1251129277787131904
https://twitter.com/Ehrenfrau3/status/1251272923668787200
https://twitter.com/K30107265/status/1251788504318828549
https://twitter.com/Sakasonis/status/1251102005910818817
https://twitter.com/MattCone3/status/1251117184534949888
https://twitter.com/Volksdichter/status/1251186371160682502
https://twitter.com/Volksdichter/status/1251186371160682502/photo/1
https://twitter.com/RiaIssa/status/1251817059517947910
https://twitter.com/janejane24/status/1251102104736989184
https://twitter.com/RiaIssa/status/1251102636071403522
https://twitter.com/TiBo01774121/status/1251108273241104384
https://twitter.com/RiaIssa/status/1251195169937993733
来自按钮的代码
<div class="css-1dbjc4n r-my5ep6 r-qklmqi r-1adg3ll">
<div aria-haspopup="false" role="button" data-focusable="true" tabindex="0" class="css-18t94o4 css-1dbjc4n r-1777fci r-1jayybb r-o7ynqc r-6416eg r-13qz1uu">
<div class="css-1dbjc4n r-16y2uox r-1wbh5a2 r-1777fci">
<div dir="auto" class="css-901oao r-1n1174f r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-q4m81j r-qvutc0"><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Weitere Antworten anzeigen</span></div></div></div></div>
尝试以下解决方案::
url = 'https://twitter.com/RegSprecher/status/1251100551183507456'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 20)
scrolls = 7
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(text(),'Weitere Antworten anzeigen')]"))).click()
注意:请将以下导入添加到您的解决方案
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait