已抓取的文本仅保存在变量 returns 中的最后一个文本
Scraped texts saved in a variable returns only the last text
我正在使用 selenium 从新闻网站的第一页抓取关于给定关键字的所有文章。代码如下:
homepage = "https://duckduckgo.com/?q=site%3Awww.ilgiornale.it+immigrati&t=h_&ia=web"
driver.get(homepage)
links_giornale = driver.find_elements(By.XPATH, "//a[@class='result__a js-result-title-link']")
hrefs_giornale = []
for link in links_giornale[1:]:
pages = link.get_attribute("href")
hrefs_giornale.append(pages)
#to accept cookies the first time I access the website
driver.get(hrefs_giornale[0])
driver.find_element(By.XPATH, "//button[@mode='primary']").click()
for href in hrefs_giornale:
driver.get(href)
element = driver.find_elements(By.XPATH, "//div[@class='content__body typography']")
for art in element:
print(art.text)
在最后一行,如果我打印 art.text 它会正确打印从第一个网页抓取的所有文章,但是如果我将文本保存在变量中或将所有文本附加到列表中,它 returns 只剩最后一篇了。我尝试在变量中使用 art.text ,使用列表理解并使用 append() 但结果总是一样的。你能帮我理解问题是什么吗?因为我需要操纵所有这些文本。谢谢!
尝试这样的事情
all_texts = []
for href in hrefs_giornale:
driver.get(href)
element = driver.find_elements(By.XPATH, "//div[@class='content__body typography']")
for art in element:
all_texts.append(art.text)
for t in all_texts:
print(t)
https://www.kite.com/python/answers/how-to-add-a-string-to-a-list-as-an-element-in-python
我正在使用 selenium 从新闻网站的第一页抓取关于给定关键字的所有文章。代码如下:
homepage = "https://duckduckgo.com/?q=site%3Awww.ilgiornale.it+immigrati&t=h_&ia=web"
driver.get(homepage)
links_giornale = driver.find_elements(By.XPATH, "//a[@class='result__a js-result-title-link']")
hrefs_giornale = []
for link in links_giornale[1:]:
pages = link.get_attribute("href")
hrefs_giornale.append(pages)
#to accept cookies the first time I access the website
driver.get(hrefs_giornale[0])
driver.find_element(By.XPATH, "//button[@mode='primary']").click()
for href in hrefs_giornale:
driver.get(href)
element = driver.find_elements(By.XPATH, "//div[@class='content__body typography']")
for art in element:
print(art.text)
在最后一行,如果我打印 art.text 它会正确打印从第一个网页抓取的所有文章,但是如果我将文本保存在变量中或将所有文本附加到列表中,它 returns 只剩最后一篇了。我尝试在变量中使用 art.text ,使用列表理解并使用 append() 但结果总是一样的。你能帮我理解问题是什么吗?因为我需要操纵所有这些文本。谢谢!
尝试这样的事情
all_texts = []
for href in hrefs_giornale:
driver.get(href)
element = driver.find_elements(By.XPATH, "//div[@class='content__body typography']")
for art in element:
all_texts.append(art.text)
for t in all_texts:
print(t)
https://www.kite.com/python/answers/how-to-add-a-string-to-a-list-as-an-element-in-python