获取 href 值 BeautifulSoup

Question

我在获取 href 值时遇到错误，它说：“ResultSet 对象没有属性 'find_all'。您可能将元素列表视为单个元素。您是否调用了 find_all () 当你打算调用 find() 时？”但是当我在获取 href 值代码上更改为“find()”时，它说：“ResultSet 对象没有属性 'find'。您可能将元素列表视为单个元素。您调用 find_all() 当你打算调用 find() 时？” 这是我的代码：

titles = []
dates = []
links = []
page = 1

while (page <= 60):
    url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
    soup = bs(url.text, 'lxml')
    container = soup.find_all('div', class_='container content')
    for l_media in container:
        media_cont = l_media.find_all('div', class_='list media_rows list-berita')
        for article in media_cont:
            article_cont = article.find_all('article')
            for title in article_cont:
                news_title = title.find('h2', class_='title')
                titles.append(news_title.text.strip())
            for date in article_cont:
                news_date = date.find('span', class_='date')
                dates.append(news_date.text.strip())
            for a_tag in article_cont.find('a'):
                link = a_tag['href']
                links.append(link)            
    page += 1

Answer 1

没有必要使用所有这些循环，看看替代方法。

示例

from bs4 import BeautifulSoup
import requests

data = []
page = 1

url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = BeautifulSoup(url.text, 'lxml')



while (page <= 10):
    for article in soup.select('div.list-berita article'):
        news_title = article.find('h2', class_='title').text
        news_date = article.find('span', class_='date').contents[1]
        link = article.find('a')['href']

        data.append({
            'title':news_title,
            'date':news_date,
            'link':link
        })
    page += 1
    
data

获取 href 值 BeautifulSoup

getting href value BeautifulSoup

python

beautifulsoup

jupyter-notebook

示例