获取 href 值 BeautifulSoup
getting href value BeautifulSoup
我在获取 href 值时遇到错误,它说:“ResultSet 对象没有属性 'find_all'。您可能将元素列表视为单个元素。您是否调用了 find_all () 当你打算调用 find() 时?”但是当我在获取 href 值代码上更改为“find()”时,它说:“ResultSet 对象没有属性 'find'。您可能将元素列表视为单个元素。您调用 find_all() 当你打算调用 find() 时?”
这是我的代码:
titles = []
dates = []
links = []
page = 1
while (page <= 60):
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = bs(url.text, 'lxml')
container = soup.find_all('div', class_='container content')
for l_media in container:
media_cont = l_media.find_all('div', class_='list media_rows list-berita')
for article in media_cont:
article_cont = article.find_all('article')
for title in article_cont:
news_title = title.find('h2', class_='title')
titles.append(news_title.text.strip())
for date in article_cont:
news_date = date.find('span', class_='date')
dates.append(news_date.text.strip())
for a_tag in article_cont.find('a'):
link = a_tag['href']
links.append(link)
page += 1
没有必要使用所有这些循环,看看替代方法。
示例
from bs4 import BeautifulSoup
import requests
data = []
page = 1
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = BeautifulSoup(url.text, 'lxml')
while (page <= 10):
for article in soup.select('div.list-berita article'):
news_title = article.find('h2', class_='title').text
news_date = article.find('span', class_='date').contents[1]
link = article.find('a')['href']
data.append({
'title':news_title,
'date':news_date,
'link':link
})
page += 1
data
我在获取 href 值时遇到错误,它说:“ResultSet 对象没有属性 'find_all'。您可能将元素列表视为单个元素。您是否调用了 find_all () 当你打算调用 find() 时?”但是当我在获取 href 值代码上更改为“find()”时,它说:“ResultSet 对象没有属性 'find'。您可能将元素列表视为单个元素。您调用 find_all() 当你打算调用 find() 时?” 这是我的代码:
titles = []
dates = []
links = []
page = 1
while (page <= 60):
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = bs(url.text, 'lxml')
container = soup.find_all('div', class_='container content')
for l_media in container:
media_cont = l_media.find_all('div', class_='list media_rows list-berita')
for article in media_cont:
article_cont = article.find_all('article')
for title in article_cont:
news_title = title.find('h2', class_='title')
titles.append(news_title.text.strip())
for date in article_cont:
news_date = date.find('span', class_='date')
dates.append(news_date.text.strip())
for a_tag in article_cont.find('a'):
link = a_tag['href']
links.append(link)
page += 1
没有必要使用所有这些循环,看看替代方法。
示例
from bs4 import BeautifulSoup
import requests
data = []
page = 1
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = BeautifulSoup(url.text, 'lxml')
while (page <= 10):
for article in soup.select('div.list-berita article'):
news_title = article.find('h2', class_='title').text
news_date = article.find('span', class_='date').contents[1]
link = article.find('a')['href']
data.append({
'title':news_title,
'date':news_date,
'link':link
})
page += 1
data