selenium - 滚动网站后获取所有推文 - python

selenium - get all tweets after scrolling the website - python

我的问题实际上是两个。一种是我向下滚动直到它不再起作用,然后尝试保存所有答案。不幸的是,我只得到了一小部分较低的答案。有没有办法得到所有的答案?我在睡眠时间尝试它,但它不起作用。

我的第二个问题是,在某些页面上,页面底部会出现一个按钮,单击该按钮可获得更多答案。但是我还没有找到点击它的方法。

如果能给我小费,我将不胜感激


url = 'https://twitter.com/RegSprecher/status/1251100551183507456'

driver = webdriver.Chrome(r"path_chromedriver.exe")
driver.implicitly_wait(10)
driver.get(url)


# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(1)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        # If heights are the same it will exit the function
        break
    last_height = new_height

#Wait
time.sleep(30)

#tweet id

tweet_id = driver.find_elements_by_css_selector("a[href*='status']")

for tweet in tweet:
    print(tweet.text)

for tweet_id in tweet_id:
    print(tweet_id.get_attribute('href'))


driver.quit()

https://twitter.com/MarkFin79124805/status/1251129277787131904
https://twitter.com/Ehrenfrau3/status/1251272923668787200
https://twitter.com/K30107265/status/1251788504318828549
https://twitter.com/Sakasonis/status/1251102005910818817
https://twitter.com/MattCone3/status/1251117184534949888
https://twitter.com/Volksdichter/status/1251186371160682502
https://twitter.com/Volksdichter/status/1251186371160682502/photo/1
https://twitter.com/RiaIssa/status/1251817059517947910
https://twitter.com/janejane24/status/1251102104736989184
https://twitter.com/RiaIssa/status/1251102636071403522
https://twitter.com/TiBo01774121/status/1251108273241104384
https://twitter.com/RiaIssa/status/1251195169937993733

来自按钮的代码

<div class="css-1dbjc4n r-my5ep6 r-qklmqi r-1adg3ll">
<div aria-haspopup="false" role="button" data-focusable="true" tabindex="0" class="css-18t94o4 css-1dbjc4n r-1777fci r-1jayybb r-o7ynqc r-6416eg r-13qz1uu">
<div class="css-1dbjc4n r-16y2uox r-1wbh5a2 r-1777fci">
<div dir="auto" class="css-901oao r-1n1174f r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-q4m81j r-qvutc0"><span class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">Weitere Antworten anzeigen</span></div></div></div></div>

尝试以下解决方案::

url = 'https://twitter.com/RegSprecher/status/1251100551183507456'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 20)

scrolls = 7
while True:
    scrolls -= 1
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(3)
    if scrolls < 0:
        break
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(text(),'Weitere Antworten anzeigen')]"))).click()

注意:请将以下导入添加到您的解决方案

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait