从 VegasInsider 抓取 table
Webscraping a table from VegasInsider
我想从 Vegas Insider
中抓取这个 table
在网络抓取方面,我完全是个初学者。我已经通过 Whosebug 尝试了几种不同的方法,但一直无法确定。
这是我所能得到的。
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/').text
soup = BeautifulSoup(source, "html.parser")
tbl = soup.find('table', class_='frodds-data-tbl')
for matchups in tbl.find_all('td', {'class': ['viCellBg1', 'oddsGameCell','cellTextNorm','cellTextNorm']}):
if matchups.span is not None:
gameDate = matchups.span.text
print(gameDate)
for b_ in matchups.find_all('b'):
print(b_.a.text)
我最终会将这些结果发送到 CSV 并更改列 headers 以匹配 table 上的书名。在此感谢任何帮助。
您可以使用此示例将数据加载到 DataFrame 中:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# clean-up the cells:
for br in soup.select("br"):
br.replace_with("\n")
df = pd.read_html(str(soup.select_one(".frodds-data-tbl")))[0]
# set column names:
# df.columns = ['col1', 'col2', ...]
df.to_csv("data.csv", index=False)
print(df)
打印:
0 1 2 3 4 5 6 7 8 9
0 02/20 1:00 PM 819 Wright State 820 Detroit Mercy -120 +100 -125 +105 -125 +105 -120 +100 -114 -105 -115 -105 -120 +100 -120 +100 -125 +105
1 02/20 1:00 PM 821 Michigan 822 Wisconsin +110 -130 +135 -155 +125 -150 +135 -155 +130 -156 +130 -150 +135 -160 +120 -145 +135 -155
2 02/20 1:00 PM 823 Providence 824 Butler -160 +130 -155 +135 -170 +140 -160 +140 -170 +140 -160 +140 -160 +135 -155 +127 -155 +135
3 02/20 1:00 PM 825 Fairfield 826 Iona +650 -1000 +525 -750 +550 -800 +525 -750 +520 -780 +500 -720 +530 -750 +600 -900 +500 -700
...
并保存 data.csv
(来自 LibreOffice 的屏幕截图):
如果你不关心格式,你可以使用pd.read_html
:
import pandas as pd
url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/"
pd.read_html(url)[7]
我想从 Vegas Insider
中抓取这个 table在网络抓取方面,我完全是个初学者。我已经通过 Whosebug 尝试了几种不同的方法,但一直无法确定。
这是我所能得到的。
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/').text
soup = BeautifulSoup(source, "html.parser")
tbl = soup.find('table', class_='frodds-data-tbl')
for matchups in tbl.find_all('td', {'class': ['viCellBg1', 'oddsGameCell','cellTextNorm','cellTextNorm']}):
if matchups.span is not None:
gameDate = matchups.span.text
print(gameDate)
for b_ in matchups.find_all('b'):
print(b_.a.text)
我最终会将这些结果发送到 CSV 并更改列 headers 以匹配 table 上的书名。在此感谢任何帮助。
您可以使用此示例将数据加载到 DataFrame 中:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# clean-up the cells:
for br in soup.select("br"):
br.replace_with("\n")
df = pd.read_html(str(soup.select_one(".frodds-data-tbl")))[0]
# set column names:
# df.columns = ['col1', 'col2', ...]
df.to_csv("data.csv", index=False)
print(df)
打印:
0 1 2 3 4 5 6 7 8 9
0 02/20 1:00 PM 819 Wright State 820 Detroit Mercy -120 +100 -125 +105 -125 +105 -120 +100 -114 -105 -115 -105 -120 +100 -120 +100 -125 +105
1 02/20 1:00 PM 821 Michigan 822 Wisconsin +110 -130 +135 -155 +125 -150 +135 -155 +130 -156 +130 -150 +135 -160 +120 -145 +135 -155
2 02/20 1:00 PM 823 Providence 824 Butler -160 +130 -155 +135 -170 +140 -160 +140 -170 +140 -160 +140 -160 +135 -155 +127 -155 +135
3 02/20 1:00 PM 825 Fairfield 826 Iona +650 -1000 +525 -750 +550 -800 +525 -750 +520 -780 +500 -720 +530 -750 +600 -900 +500 -700
...
并保存 data.csv
(来自 LibreOffice 的屏幕截图):
如果你不关心格式,你可以使用pd.read_html
:
import pandas as pd
url = "https://www.vegasinsider.com/college-basketball/odds/las-vegas/money/"
pd.read_html(url)[7]