如何根据字符串和 class 名称从 href 中识别 link?

How to identify a link from href based on string and class name?

我正在尝试从 https://betsapi.com/ 获取一些数据,特别是使用 python 从足球区获取数据 我在代码中看到 link 是动态的,我的意思是几周前它是 https://betsapi.com/cin/soccer and now is https://betsapi.com/cip/soccer.

查看代码,我想了解如何从这部分代码中识别当前足球link。

<div class="card-tabs text-center">
            <a href="/" class="card-tabs-item active">
        All (70)
      </a>
                <a href="/cip/basketball" class="card-tabs-item"></a>
                <a href="/cip/soccer" class="card-tabs-item"></a>
                <a href="/cip/horse-racing" class="card-tabs-item"> </a>
                <a href="/cip/greyhounds" class="card-tabs-item"></a>
                <a href="/cip/ice-hockey" class="card-tabs-item"></a>
                <a href="/cip/table-tennis" class="card-tabs-item"></a>
                <a href="/cip/volleyball" class="card-tabs-item"></a>                                                    
      <div class="dropdown show">
      <a href="#" class="card-tabs-item" data-toggle="dropdown" aria-expanded="true">More</a>
      <div class="dropdown-menu dropdown-menu-right dropdown-menu-arrow show" x-placement="bottom-end" style="position: absolute; transform: translate3d(-109px, 55px, 0px); top: 0px; left: 0px; will-change: transform;">
                                    <a class="dropdown-item " href="/cip/golf"></a>
                                    <a class="dropdown-item " href="/cip/tennis"></a>
                                    <a class="dropdown-item " href="/cip/baseball"></a>
                                    <a class="dropdown-item " href="/cip/esports"></a>
                                    <a class="dropdown-item " href="/cip/darts"></a>
                                    <a class="dropdown-item " href="/cip/handball"></a>
                                    <a class="dropdown-item " href="/cip/futsal"></a>ù

非常感谢

我只搜索卡片选项卡项并查找 'soccer'。然后打印 href 得到 link:

import requests
from bs4 import BeautifulSoup

url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.141 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

cards = soup.find_all('a', {'class':'card-tabs-item'})
soccer = [x for x in cards if 'soccer' in x['href']][0]
link = url + soccer['href']

输出:

print(link)
https://betsapi.com/cip/soccer

作为替代方案,您可以使用 css selectors 和 select <a> 那就是:

  • href 应该以 soccer:

    结尾
    link = url + soup.select_one('a[href$="soccer"]')['href']
    

    或更具体:

    link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']
    
  • href 应包含 soccer:

    link = url + soup.select_one('a[href*="soccer"]')['href']
    

    或更具体:

    link = url + soup.select_one('a.card-tabs-item[href*="soccer"]')
    
例子
import requests
from bs4 import BeautifulSoup

url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']

print(link)
输出
https://betsapi.com/cip/soccer