使用 Beautiful Soup 和 Requests 按下按钮后如何获得 HTML 更改
How to get HTML changes after pressing button with Beautiful Soup and Requests
我想要 HTML 这个网站 https://www.forebet.com/en/football-predictions 在按更多[+] 按钮足以加载所有游戏后。每次点击页面底部的更多 [+] 按钮时,HTML 都会更改并显示更多足球比赛。如何获取加载了所有足球比赛的页面请求?
from bs4 import BeautifulSoup
import requests
leagues = {"EPL","UCL","Es1","De1","Fr1","Pt1","It1","UEL"}
class ForeBet:
#gets all games from the leagues on leagues returning the games on a string list
#game format is League|Date|Hour|Home Team|Away Team|Prob Home|Prob Tie| Prob Away
def get_games_and_probs(self):
response=requests.get('https://www.forebet.com/en/football-prediction')
soup = BeautifulSoup(response.text, 'html.parser')
results=list()
games = soup.findAll(class_='rcnt tr_0')+soup.findAll(class_='rcnt tr_1')
for game in games:
if(leagues.__contains__(game.find(class_='shortTag').text.strip())):
game=game.find(class_='shortTag').text+"|"+\
game.find(class_='date_bah').text.split(" ")[0]+"|"+ \
game.find(class_='date_bah').text.split(" ")[1]+"|"+ \
game.find(class_='homeTeam').text+"|"+\
game.find(class_='awayTeam').text+"|"+\
game.find(class_='fprc').findNext().text+"|"+\
game.find(class_='fprc').findNext().findNext().text+"|"+\
game.find(class_='fprc').findNext().findNext().findNext().text
print(game)
results.append(game)
return results
如前所述,requests 和 beautfulsoup 用于解析数据,而不是与网站交互。为此,您需要 Selenium。
你的另一个选择是看你是否可以直接获取数据,并查看是否有参数可以像你点击获取更多一样再次请求。这对你有用吗?
import pandas as pd
import requests
results = pd.DataFrame()
i=0
while True:
print(i)
url = 'https://m.forebet.com/scripts/getrs.php'
payload = {
'ln': 'en',
'tp': '1x2',
'in': '%s' %(i+11),
'ord': '0'}
jsonData = requests.get(url, params=payload).json()
results = results.append(pd.DataFrame(jsonData[0]), sort=False).reset_index(drop=True)
if max(results['id'].value_counts()) <=1:
i+=1
else:
results = results.drop_duplicates()
break
输出:
print(results)
id pr_under ... country full_name
0 1473708 31 ... England Isthmian League
1 1473713 35 ... England Isthmian League
2 1473745 28 ... England Isthmian League
3 1473710 35 ... England Isthmian League
4 1473033 28 ... England Premier League 2
.. ... ... ... ... ...
515 1419208 47 ... Argentina Torneo Federal A
516 1419156 57 ... Argentina Torneo Federal A
517 1450589 50 ... Armenia Premier League
518 1450590 35 ... Armenia Premier League
519 1450591 52 ... Armenia Premier League
[518 rows x 73 columns]
我想要 HTML 这个网站 https://www.forebet.com/en/football-predictions 在按更多[+] 按钮足以加载所有游戏后。每次点击页面底部的更多 [+] 按钮时,HTML 都会更改并显示更多足球比赛。如何获取加载了所有足球比赛的页面请求?
from bs4 import BeautifulSoup
import requests
leagues = {"EPL","UCL","Es1","De1","Fr1","Pt1","It1","UEL"}
class ForeBet:
#gets all games from the leagues on leagues returning the games on a string list
#game format is League|Date|Hour|Home Team|Away Team|Prob Home|Prob Tie| Prob Away
def get_games_and_probs(self):
response=requests.get('https://www.forebet.com/en/football-prediction')
soup = BeautifulSoup(response.text, 'html.parser')
results=list()
games = soup.findAll(class_='rcnt tr_0')+soup.findAll(class_='rcnt tr_1')
for game in games:
if(leagues.__contains__(game.find(class_='shortTag').text.strip())):
game=game.find(class_='shortTag').text+"|"+\
game.find(class_='date_bah').text.split(" ")[0]+"|"+ \
game.find(class_='date_bah').text.split(" ")[1]+"|"+ \
game.find(class_='homeTeam').text+"|"+\
game.find(class_='awayTeam').text+"|"+\
game.find(class_='fprc').findNext().text+"|"+\
game.find(class_='fprc').findNext().findNext().text+"|"+\
game.find(class_='fprc').findNext().findNext().findNext().text
print(game)
results.append(game)
return results
如前所述,requests 和 beautfulsoup 用于解析数据,而不是与网站交互。为此,您需要 Selenium。
你的另一个选择是看你是否可以直接获取数据,并查看是否有参数可以像你点击获取更多一样再次请求。这对你有用吗?
import pandas as pd
import requests
results = pd.DataFrame()
i=0
while True:
print(i)
url = 'https://m.forebet.com/scripts/getrs.php'
payload = {
'ln': 'en',
'tp': '1x2',
'in': '%s' %(i+11),
'ord': '0'}
jsonData = requests.get(url, params=payload).json()
results = results.append(pd.DataFrame(jsonData[0]), sort=False).reset_index(drop=True)
if max(results['id'].value_counts()) <=1:
i+=1
else:
results = results.drop_duplicates()
break
输出:
print(results)
id pr_under ... country full_name
0 1473708 31 ... England Isthmian League
1 1473713 35 ... England Isthmian League
2 1473745 28 ... England Isthmian League
3 1473710 35 ... England Isthmian League
4 1473033 28 ... England Premier League 2
.. ... ... ... ... ...
515 1419208 47 ... Argentina Torneo Federal A
516 1419156 57 ... Argentina Torneo Federal A
517 1450589 50 ... Armenia Premier League
518 1450590 35 ... Armenia Premier League
519 1450591 52 ... Armenia Premier League
[518 rows x 73 columns]