如何提取整个 table 并将其存储在 CSV 文件中?
How do I extract entire table and store it in CSV file?
我正在尝试 抓取 整个 table 并希望将其存储在 .csv 文件中。
当我试图 抓取 这个数据时,它向我显示错误 NO TABLES FOUND.
这是我的代码。
from pandas.io.html import read_html
page = 'https://games.crossfit.com/leaderboard/open/2020?view=0&division=1&scaled=0&sort=0'
tables = read_html(page, attrs={"class":"desktop athletes"})
print ("Extracted {num} tables".format(num=len(tables)))
任何建议或指导或任何帮助?
本页面使用JavaScript
从服务器获取数据并生成table。
但是在 Chrome/Firefox 中使用 DevTool
您可以看到(在选项卡 Network
中)从浏览器到服务器的所有请求,并且 XHR/AJAX
请求之一获取所有数据JSON
格式,因此您可以使用此 url 将其也作为 JSON
获取,您可以将其转换为 Python 数据,而不必抓取它。
import requests
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
for row in data['leaderboardRows']:
print(row['entrant']['competitorName'], row['overallScore'], [(x['rank'],x['scoreDisplay']) for x in row['scores']])
结果
Patrick Vellner 64 [('13', '8:38'), ('19', '988 reps'), ('12', '6:29'), ('18', '16:29'), ('2', '10:09')]
Mathew Fraser 74 [('8', '8:28'), ('40', '959 reps'), ('3', '6:08'), ('2', '14:22'), ('21', '10:45')]
Lefteris Theofanidis 94 [('1', '8:05'), ('3', '1021 reps'), ('13', '6:32'), ('4', '15:00'), ('73', '11:11')]
# ... more ...
如下所述,您可以访问 api 来获取数据。要另存为 CSV,您需要使用 json 格式来获取您需要的内容(即展平嵌套数据)。有 2 种方法可以做到这一点,a) 将其完全展平,以便每一行都对应于每个参赛者,或者 b) 每个参赛者的每个序数分数都有单独的行。
唯一的区别是,如果您选择 a) 您将得到一个非常宽的 table(但没有重复的数据),如果您选择 b),您将得到一个很长的 [=24] =],重复数据。
由于文件不是太大,我选择了选项 b),这样您就可以随时按特定列或过滤器进行分组:
import requests
import pandas as pd
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
results = pd.DataFrame()
df = pd.DataFrame(data['leaderboardRows'])
for idx, row in df.iterrows():
entrantData = pd.Series()
scoresData = pd.DataFrame()
entrantResults = pd.DataFrame()
for idx2, each in row.iteritems():
if type(each) == dict:
temp = pd.DataFrame.from_dict(each, orient='index')
entrantData = entrantData.append(temp)
elif type(each) == list:
temp2 = pd.DataFrame(each)
scoresData = scoresData.append(temp2, sort=True).reset_index(drop=True)
else:
entrantData = entrantData.append(pd.Series(each, name=idx2))
entrantResults = entrantResults.append(scoresData, sort=True).reset_index(drop=True)
entrantResults = entrantResults.merge(pd.concat([entrantData.T] *5, ignore_index=True), left_index=True, right_index=True)
results = results.append(entrantResults, sort=True).reset_index(drop=True)
results.to_csv('file.csv', index=False)
输出:前 15 行 250
print (results.head(15).to_string())
affiliate affiliateId affiliateName age breakdown competitorId competitorName countryChampion countryOfOriginCode countryOfOriginName divisionId drawBlueHR firstName gender heat height highlight judge lane lastName mobileScoreDisplay nextStage ordinal overallRank overallScore postCompStatus profilePicS3key rank scaled score scoreDisplay scoreIdentifier status time video weight
0 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 10 rounds 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 1 1 64 d471c-P158264_7-184.jpg 13 0 11800382 8:38 9d3979222412df2842a1 ACT 518 0 195 lb
1 CrossFit Soul Miami 1918 CrossFit Nanaimo 30 29 rounds +\n2 thrusters\n 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Lamar Vernon Vellner 2 1 64 d471c-P158264_7-184.jpg 19 0 1009880000 988 reps 9bd66b00e8367cc7fd0c ACT NaN 0 195 lb
2 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 165 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Jason Lochhead Vellner 3 1 64 d471c-P158264_7-184.jpg 12 0 1001650151 6:29 2347b4cb7339f2a13e6c ACT 389 0 195 lb
3 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 4 1 64 d471c-P158264_7-184.jpg 18 0 1002400211 16:29 bcfd3882df3fa2e99451 ACT 989 0 195 lb
4 CrossFit New England 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Matt O'Keefe Vellner 5 1 64 d471c-P158264_7-184.jpg 2 0 1002400591 10:09 4bb25bed5f71141da122 ACT 609 0 195 lb
5 CrossFit Mayhem 3220 CrossFit Mayhem 30 10 rounds 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False darren hunsucker Fraser 1 2 74 9e218-P153604_4-184.jpg 8 0 11800392 8:28 18b5b2e137f00a2d9d7d ACT 508 0 195 lb
6 CrossFit Soul Miami 3220 CrossFit Mayhem 30 28 rounds +\n4 thrusters\n3 toes-to-bars\n 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Daniel Lopez Fraser 2 2 74 9e218-P153604_4-184.jpg 40 0 1009590000 959 reps b96bc1b7b58fa34a28a1 ACT NaN 0 195 lb
7 CrossFit Mayhem 3220 CrossFit Mayhem 30 165 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Jason Fernandez Fraser 3 2 74 9e218-P153604_4-184.jpg 3 0 1001650172 6:08 4f4a994a045652c894c5 ACT 368 0 195 lb
8 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Tasia Percevecz Fraser 4 2 74 9e218-P153604_4-184.jpg 2 0 1002400338 14:22 1a4a7d8760e72bb12d68 ACT 862 0 195 lb
9 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Kelley Jackson Fraser 5 2 74 9e218-P153604_4-184.jpg 21 0 1002400555 10:45 b4a259e7049f47f65356 ACT 645 0 195 lb
10 NaN 0 30 10 rounds 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 1 3 94 931eb-P514502_2-184.jpg 1 0 11800415 8:05 c8907e02512f42ff3142 ACT 485 1 81 kg
11 NaN 0 30 30 rounds +\n1 thruster\n 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 2 3 94 931eb-P514502_2-184.jpg 3 0 1010210000 1021 reps 63add31b22606957701c ACT NaN 1 81 kg
12 NaN 0 30 165 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 3 3 94 931eb-P514502_2-184.jpg 13 0 1001650148 6:32 46d7cdb691c25ea38dbe ACT 392 1 81 kg
13 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 4 3 94 931eb-P514502_2-184.jpg 4 0 1002400300 15:00 d49e55a2af5840740071 ACT 900 1 81 kg
14 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 5 3 94 931eb-P514502_2-184.jpg 73 0 1002400529 11:11 d35c9d687eb6b72c8e36 ACT 671 1 81 kg
我正在尝试 抓取 整个 table 并希望将其存储在 .csv 文件中。 当我试图 抓取 这个数据时,它向我显示错误 NO TABLES FOUND.
这是我的代码。
from pandas.io.html import read_html
page = 'https://games.crossfit.com/leaderboard/open/2020?view=0&division=1&scaled=0&sort=0'
tables = read_html(page, attrs={"class":"desktop athletes"})
print ("Extracted {num} tables".format(num=len(tables)))
任何建议或指导或任何帮助?
本页面使用JavaScript
从服务器获取数据并生成table。
但是在 Chrome/Firefox 中使用 DevTool
您可以看到(在选项卡 Network
中)从浏览器到服务器的所有请求,并且 XHR/AJAX
请求之一获取所有数据JSON
格式,因此您可以使用此 url 将其也作为 JSON
获取,您可以将其转换为 Python 数据,而不必抓取它。
import requests
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
for row in data['leaderboardRows']:
print(row['entrant']['competitorName'], row['overallScore'], [(x['rank'],x['scoreDisplay']) for x in row['scores']])
结果
Patrick Vellner 64 [('13', '8:38'), ('19', '988 reps'), ('12', '6:29'), ('18', '16:29'), ('2', '10:09')]
Mathew Fraser 74 [('8', '8:28'), ('40', '959 reps'), ('3', '6:08'), ('2', '14:22'), ('21', '10:45')]
Lefteris Theofanidis 94 [('1', '8:05'), ('3', '1021 reps'), ('13', '6:32'), ('4', '15:00'), ('73', '11:11')]
# ... more ...
如下所述,您可以访问 api 来获取数据。要另存为 CSV,您需要使用 json 格式来获取您需要的内容(即展平嵌套数据)。有 2 种方法可以做到这一点,a) 将其完全展平,以便每一行都对应于每个参赛者,或者 b) 每个参赛者的每个序数分数都有单独的行。
唯一的区别是,如果您选择 a) 您将得到一个非常宽的 table(但没有重复的数据),如果您选择 b),您将得到一个很长的 [=24] =],重复数据。
由于文件不是太大,我选择了选项 b),这样您就可以随时按特定列或过滤器进行分组:
import requests
import pandas as pd
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
results = pd.DataFrame()
df = pd.DataFrame(data['leaderboardRows'])
for idx, row in df.iterrows():
entrantData = pd.Series()
scoresData = pd.DataFrame()
entrantResults = pd.DataFrame()
for idx2, each in row.iteritems():
if type(each) == dict:
temp = pd.DataFrame.from_dict(each, orient='index')
entrantData = entrantData.append(temp)
elif type(each) == list:
temp2 = pd.DataFrame(each)
scoresData = scoresData.append(temp2, sort=True).reset_index(drop=True)
else:
entrantData = entrantData.append(pd.Series(each, name=idx2))
entrantResults = entrantResults.append(scoresData, sort=True).reset_index(drop=True)
entrantResults = entrantResults.merge(pd.concat([entrantData.T] *5, ignore_index=True), left_index=True, right_index=True)
results = results.append(entrantResults, sort=True).reset_index(drop=True)
results.to_csv('file.csv', index=False)
输出:前 15 行 250
print (results.head(15).to_string())
affiliate affiliateId affiliateName age breakdown competitorId competitorName countryChampion countryOfOriginCode countryOfOriginName divisionId drawBlueHR firstName gender heat height highlight judge lane lastName mobileScoreDisplay nextStage ordinal overallRank overallScore postCompStatus profilePicS3key rank scaled score scoreDisplay scoreIdentifier status time video weight
0 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 10 rounds 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 1 1 64 d471c-P158264_7-184.jpg 13 0 11800382 8:38 9d3979222412df2842a1 ACT 518 0 195 lb
1 CrossFit Soul Miami 1918 CrossFit Nanaimo 30 29 rounds +\n2 thrusters\n 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Lamar Vernon Vellner 2 1 64 d471c-P158264_7-184.jpg 19 0 1009880000 988 reps 9bd66b00e8367cc7fd0c ACT NaN 0 195 lb
2 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 165 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Jason Lochhead Vellner 3 1 64 d471c-P158264_7-184.jpg 12 0 1001650151 6:29 2347b4cb7339f2a13e6c ACT 389 0 195 lb
3 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 4 1 64 d471c-P158264_7-184.jpg 18 0 1002400211 16:29 bcfd3882df3fa2e99451 ACT 989 0 195 lb
4 CrossFit New England 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Matt O'Keefe Vellner 5 1 64 d471c-P158264_7-184.jpg 2 0 1002400591 10:09 4bb25bed5f71141da122 ACT 609 0 195 lb
5 CrossFit Mayhem 3220 CrossFit Mayhem 30 10 rounds 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False darren hunsucker Fraser 1 2 74 9e218-P153604_4-184.jpg 8 0 11800392 8:28 18b5b2e137f00a2d9d7d ACT 508 0 195 lb
6 CrossFit Soul Miami 3220 CrossFit Mayhem 30 28 rounds +\n4 thrusters\n3 toes-to-bars\n 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Daniel Lopez Fraser 2 2 74 9e218-P153604_4-184.jpg 40 0 1009590000 959 reps b96bc1b7b58fa34a28a1 ACT NaN 0 195 lb
7 CrossFit Mayhem 3220 CrossFit Mayhem 30 165 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Jason Fernandez Fraser 3 2 74 9e218-P153604_4-184.jpg 3 0 1001650172 6:08 4f4a994a045652c894c5 ACT 368 0 195 lb
8 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Tasia Percevecz Fraser 4 2 74 9e218-P153604_4-184.jpg 2 0 1002400338 14:22 1a4a7d8760e72bb12d68 ACT 862 0 195 lb
9 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Kelley Jackson Fraser 5 2 74 9e218-P153604_4-184.jpg 21 0 1002400555 10:45 b4a259e7049f47f65356 ACT 645 0 195 lb
10 NaN 0 30 10 rounds 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 1 3 94 931eb-P514502_2-184.jpg 1 0 11800415 8:05 c8907e02512f42ff3142 ACT 485 1 81 kg
11 NaN 0 30 30 rounds +\n1 thruster\n 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 2 3 94 931eb-P514502_2-184.jpg 3 0 1010210000 1021 reps 63add31b22606957701c ACT NaN 1 81 kg
12 NaN 0 30 165 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 3 3 94 931eb-P514502_2-184.jpg 13 0 1001650148 6:32 46d7cdb691c25ea38dbe ACT 392 1 81 kg
13 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 4 3 94 931eb-P514502_2-184.jpg 4 0 1002400300 15:00 d49e55a2af5840740071 ACT 900 1 81 kg
14 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 5 3 94 931eb-P514502_2-184.jpg 73 0 1002400529 11:11 d35c9d687eb6b72c8e36 ACT 671 1 81 kg