具有相同 class 的抓取列表

Question

我正在尝试从站点抓取关键字列表，但该列表存储在不同的类中且名称相同。

<div class="keywords content-div">
<span class="keyword key-content">
<a href="/en/keyword/chicken-restaurant">Chicken Restaurant</a>
</span>
<span class="keyword key-content">
<a href="/en/keyword/restaurant">Restaurant</a>
</span>
<span class="keyword key-content">
<a href="/en/keyword/fried-chicken">Fried Chicken</a>
</span>
<span class="keyword key-content">
<a href="/en/keyword/restaurant-order-in">Restaurant Order In</a>
</span>
<span class="keyword key-content">
<a href="/en/keyword/restaurant-eat-out">Restaurant Eat Out</a>
</span>
</div>
</div>

数据是这样存储在HTML形式的，我只对href后面的字符串感兴趣，

r = requests.get('https://yellowpages.com.eg/en/profile/5-roosters-fried-chicken/629053? 
position=1&key=Fast-Food&mod=category&categoryId=1527')
soup = BeautifulSoup(r.content, 'lxml')
word = soup.find_all('div', class_='keywords content-div')
for item in word:
    keywords = soup.find('span', class_='keyword key-content').find('a').text
    print(keywords)

这是我的代码，但它只获取第一行，我需要所有列表。

Answer 1

您需要找到所有 <div> 节点，然后是每个 <div> 的所有子 <span> 节点，然后是每个 <span> 的所有子 <a> 节点并检索文本。

代码：

html = ...  # response.content

soup = BeautifulSoup(html, 'html.parser')
for div in soup.find_all('div', class_='keywords content-div'):
    for span in div.find_all('span', class_='keyword key-content'):
        for a in span.find_all('a'):
            print(a.text)

输出：

Chicken Restaurant
Restaurant
Fried Chicken
Restaurant Order In
Restaurant Eat Out

或者您可以使用 css 选择器：

soup = BeautifulSoup(html, 'html.parser')
for a in soup.select('div.keywords.content-div > span.keyword.key-content > a'):
    print(a.text)

你可以帮助我的国家，检查my profile info。

具有相同 class 的抓取列表

Scraping list with the same class

python

beautifulsoup

web-scraping