将网页抓取导出到 csv 文件
Exporting web scraping into csv file
import csv
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame")
soup = BeautifulSoup(page.content, 'html.parser')
for record in soup.find_all('tr'):
try:
print(record.contents[0].text)
print(record.contents[6].text)
print(record.contents[7].text)
print(record.contents[8].text)
print(record.contents[9].text)
print(record.contents[10].text)
print(record.contents[12].text)
print(record.contents[13].text)
print(record.contents[14].text)
print(record.contents[15].text)
except:
pass
print('\n')
def scrape_data(url):
response = requests.get("https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame", timeout=10)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find_all('table')[1]
rows = table.select('tbody > tr')
header = [th.text.rstrip() for th in rows[1].find_all('th')]
with open('statsoutput.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(header)
for row in rows[1:]:
data = [th.text.rstrip() for th in row.find_all('td')]
writer.writerow(data)
if __name__=="__main__":
url = "https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame"
scrape_data(url)
我一直在尝试将统计信息从此网页导出到 csv file
。
当我 运行 我的代码时,第一部分工作正常并检索我想要的数据。
但是该函数无法将其导出到 csv file
并且我一直收到此错误:
table = soup.find_all('table')[1]
IndexError: list index out of range
我不太确定为什么。
您收到此错误是因为该站点只有一个 <table />
html 元素。所以 soupe.find_all()
返回一个长度为 1 的列表。您可以通过 soupe.find_all('table')[0]
解决此错误,或者以一种干净的方式 soup.table
.
我也检查并测试了你的代码并推荐这个:
table = soup.table
rows = table.find_all('tr')
这些更改后一切都会正常工作。您可以检查此代码运行 here。希望对你有帮助。
import csv
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame")
soup = BeautifulSoup(page.content, 'html.parser')
for record in soup.find_all('tr'):
try:
print(record.contents[0].text)
print(record.contents[6].text)
print(record.contents[7].text)
print(record.contents[8].text)
print(record.contents[9].text)
print(record.contents[10].text)
print(record.contents[12].text)
print(record.contents[13].text)
print(record.contents[14].text)
print(record.contents[15].text)
except:
pass
print('\n')
def scrape_data(url):
response = requests.get("https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame", timeout=10)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find_all('table')[1]
rows = table.select('tbody > tr')
header = [th.text.rstrip() for th in rows[1].find_all('th')]
with open('statsoutput.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(header)
for row in rows[1:]:
data = [th.text.rstrip() for th in row.find_all('td')]
writer.writerow(data)
if __name__=="__main__":
url = "https://www.cbssports.com/nba/stats/playersort/nba/year-2019-season-preseason-category-scoringpergame"
scrape_data(url)
我一直在尝试将统计信息从此网页导出到 csv file
。
当我 运行 我的代码时,第一部分工作正常并检索我想要的数据。
但是该函数无法将其导出到 csv file
并且我一直收到此错误:
table = soup.find_all('table')[1]
IndexError: list index out of range
我不太确定为什么。
您收到此错误是因为该站点只有一个 <table />
html 元素。所以 soupe.find_all()
返回一个长度为 1 的列表。您可以通过 soupe.find_all('table')[0]
解决此错误,或者以一种干净的方式 soup.table
.
我也检查并测试了你的代码并推荐这个:
table = soup.table
rows = table.find_all('tr')
这些更改后一切都会正常工作。您可以检查此代码运行 here。希望对你有帮助。