如何根据字符串和 class 名称从 href 中识别 link?
How to identify a link from href based on string and class name?
我正在尝试从 https://betsapi.com/ 获取一些数据,特别是使用 python 从足球区获取数据
我在代码中看到 link 是动态的,我的意思是几周前它是 https://betsapi.com/cin/soccer and now is https://betsapi.com/cip/soccer.
查看代码,我想了解如何从这部分代码中识别当前足球link。
<div class="card-tabs text-center">
<a href="/" class="card-tabs-item active">
All (70)
</a>
<a href="/cip/basketball" class="card-tabs-item"></a>
<a href="/cip/soccer" class="card-tabs-item"></a>
<a href="/cip/horse-racing" class="card-tabs-item"> </a>
<a href="/cip/greyhounds" class="card-tabs-item"></a>
<a href="/cip/ice-hockey" class="card-tabs-item"></a>
<a href="/cip/table-tennis" class="card-tabs-item"></a>
<a href="/cip/volleyball" class="card-tabs-item"></a>
<div class="dropdown show">
<a href="#" class="card-tabs-item" data-toggle="dropdown" aria-expanded="true">More</a>
<div class="dropdown-menu dropdown-menu-right dropdown-menu-arrow show" x-placement="bottom-end" style="position: absolute; transform: translate3d(-109px, 55px, 0px); top: 0px; left: 0px; will-change: transform;">
<a class="dropdown-item " href="/cip/golf"></a>
<a class="dropdown-item " href="/cip/tennis"></a>
<a class="dropdown-item " href="/cip/baseball"></a>
<a class="dropdown-item " href="/cip/esports"></a>
<a class="dropdown-item " href="/cip/darts"></a>
<a class="dropdown-item " href="/cip/handball"></a>
<a class="dropdown-item " href="/cip/futsal"></a>ù
非常感谢
我只搜索卡片选项卡项并查找 'soccer'
。然后打印 href 得到 link:
import requests
from bs4 import BeautifulSoup
url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.141 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
cards = soup.find_all('a', {'class':'card-tabs-item'})
soccer = [x for x in cards if 'soccer' in x['href']][0]
link = url + soccer['href']
输出:
print(link)
https://betsapi.com/cip/soccer
作为替代方案,您可以使用 css selectors
和 select <a>
那就是:
href
应该以 soccer
:
结尾
link = url + soup.select_one('a[href$="soccer"]')['href']
或更具体:
link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']
href
应包含 soccer
:
link = url + soup.select_one('a[href*="soccer"]')['href']
或更具体:
link = url + soup.select_one('a.card-tabs-item[href*="soccer"]')
例子
import requests
from bs4 import BeautifulSoup
url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']
print(link)
输出
https://betsapi.com/cip/soccer
我正在尝试从 https://betsapi.com/ 获取一些数据,特别是使用 python 从足球区获取数据 我在代码中看到 link 是动态的,我的意思是几周前它是 https://betsapi.com/cin/soccer and now is https://betsapi.com/cip/soccer.
查看代码,我想了解如何从这部分代码中识别当前足球link。
<div class="card-tabs text-center">
<a href="/" class="card-tabs-item active">
All (70)
</a>
<a href="/cip/basketball" class="card-tabs-item"></a>
<a href="/cip/soccer" class="card-tabs-item"></a>
<a href="/cip/horse-racing" class="card-tabs-item"> </a>
<a href="/cip/greyhounds" class="card-tabs-item"></a>
<a href="/cip/ice-hockey" class="card-tabs-item"></a>
<a href="/cip/table-tennis" class="card-tabs-item"></a>
<a href="/cip/volleyball" class="card-tabs-item"></a>
<div class="dropdown show">
<a href="#" class="card-tabs-item" data-toggle="dropdown" aria-expanded="true">More</a>
<div class="dropdown-menu dropdown-menu-right dropdown-menu-arrow show" x-placement="bottom-end" style="position: absolute; transform: translate3d(-109px, 55px, 0px); top: 0px; left: 0px; will-change: transform;">
<a class="dropdown-item " href="/cip/golf"></a>
<a class="dropdown-item " href="/cip/tennis"></a>
<a class="dropdown-item " href="/cip/baseball"></a>
<a class="dropdown-item " href="/cip/esports"></a>
<a class="dropdown-item " href="/cip/darts"></a>
<a class="dropdown-item " href="/cip/handball"></a>
<a class="dropdown-item " href="/cip/futsal"></a>ù
非常感谢
我只搜索卡片选项卡项并查找 'soccer'
。然后打印 href 得到 link:
import requests
from bs4 import BeautifulSoup
url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.141 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
cards = soup.find_all('a', {'class':'card-tabs-item'})
soccer = [x for x in cards if 'soccer' in x['href']][0]
link = url + soccer['href']
输出:
print(link)
https://betsapi.com/cip/soccer
作为替代方案,您可以使用 css selectors
和 select <a>
那就是:
结尾href
应该以soccer
:link = url + soup.select_one('a[href$="soccer"]')['href']
或更具体:
link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']
href
应包含soccer
:link = url + soup.select_one('a[href*="soccer"]')['href']
或更具体:
link = url + soup.select_one('a.card-tabs-item[href*="soccer"]')
例子
import requests
from bs4 import BeautifulSoup
url = 'https://betsapi.com'
headers = {'user-agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
link = url + soup.select_one('a.card-tabs-item[href$="soccer"]')['href']
print(link)
输出
https://betsapi.com/cip/soccer