如何循环 div 并仅使用 BeautifulSoup 和 python 获取段落标记中的文本?
How to loop a div and get the text in the paragraph tag only using BeautifulSoup and python?
我正在使用 beautifulsoup 和 python 抓取网页并仅从网站的段落标记中提取文本。
This is the page I want to crawl
我想要所有段落标签中的所有文本。
提前致谢
始终使用 selenium 作为节省资源的最后手段。
from selenium import webdriver
url = 'https://www.who.int/csr/disease/coronavirus_infections/faq_dec12/en/'
driver = webdriver.Chrome()
try:
driver.get(url)
div_text = driver.find_element_by_id('primary').text
with open('website_content.txt','w') as f:
f.write(div_text)
except Exception as e:
print(e)
finally:
if driver is not None:
driver.close()
你可以用 requests 和 beautiful soup 达到同样的效果,如下:
import requests as rq
from bs4 import BeautifulSoup
response = rq.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text,'html.parser')
div_text = soup.find('div',{'id':'primary'}).text
with open('website_content.txt','w') as f:
f.write(div_text)
我正在使用 beautifulsoup 和 python 抓取网页并仅从网站的段落标记中提取文本。 This is the page I want to crawl 我想要所有段落标签中的所有文本。
提前致谢
始终使用 selenium 作为节省资源的最后手段。
from selenium import webdriver
url = 'https://www.who.int/csr/disease/coronavirus_infections/faq_dec12/en/'
driver = webdriver.Chrome()
try:
driver.get(url)
div_text = driver.find_element_by_id('primary').text
with open('website_content.txt','w') as f:
f.write(div_text)
except Exception as e:
print(e)
finally:
if driver is not None:
driver.close()
你可以用 requests 和 beautiful soup 达到同样的效果,如下:
import requests as rq
from bs4 import BeautifulSoup
response = rq.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text,'html.parser')
div_text = soup.find('div',{'id':'primary'}).text
with open('website_content.txt','w') as f:
f.write(div_text)