如何循环并将我的代码附加到 CSV 文件? - Python
How do I loop and append my code to a CSV file? - Python
我是编码的新手,我已经为此工作了大约一个星期,但遇到了死胡同,所以请保持温柔。
我要做的是以打印语句显示的格式从 url 中获取所有数据,并将其放入 CSV 文件中。
我已经设法成功地打印了一行,但我不知道如何让它遍历所有其他行并将它们附加到 CSV 文件中。有什么提示或技巧吗?
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')
import urllib.request
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span', attrs={"itemprop" : "name address"})
date = dateLists[0].text.strip()
time = timeLists[0].text.strip()
homeTeam = homeTeamLists[0].text.strip()
awayTeam = awayTeamLists[0].text.strip()
score = scoreLists[0].text.strip()
venue = venueLists[0].text.strip()
print(date, time, homeTeam, score, awayTeam, venue)
您只需遍历列表中的每个项目。您可以使用 enumerate
获取索引位置,然后使用它将每个项目附加到列表和数据帧中:
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span', attrs={"itemprop" : "name address"})
dateList = []
timeList = []
homeTeamList = []
awayTeamList = []
scoreList = []
venueList = []
for idx, v in enumerate(dateLists):
dateList.append(dateLists[idx].text.strip())
timeList.append(timeLists[idx].text.strip())
homeTeamList.append(homeTeamLists[idx].text.strip())
awayTeamList.append(awayTeamLists[idx].text.strip())
scoreList.append(scoreLists[idx].text.strip())
venueList.append(venueLists[idx].text.strip())
df = pd.DataFrame({'date':dateList,
'time':timeList,
'home':homeTeamList,
'away':awayTeamList,
'score':scoreList,
'venue':venueList})
输出:
print(df.to_string())
date time home away score venue
0 2019-09-04 16:00 UTC+3 Ethiopia Lesotho 0–0 Bahir Dar Stadium, Bahir Dar
1 2019-09-08 15:00 UTC+2 Lesotho Ethiopia 1–1 Setsoto Stadium, Maseru
2 2019-09-05 18:00 UTC+3 Somalia Zimbabwe 1–0 El Hadj Hassan Gouled Aptidon Stadium, Djibout...
3 2019-09-10 15:00 UTC+2 Zimbabwe Somalia 3–1 National Sports Stadium, Harare
4 2019-09-04 16:00 UTC+3 Eritrea Namibia 1–2 Denden Stadium, Asmara
5 2019-09-10 19:00 UTC+2 Namibia Eritrea 2–0 Sam Nujoma Stadium, Windhoek
6 2019-09-04 15:00 UTC+2 Burundi Tanzania 1–1 Prince Louis Rwagasore Stadium, Bujumbura
7 2019-09-08 16:00 UTC+3 Tanzania Burundi 1–1 (a.e.t.) National Stadium, Dar es Salaam
8 2019-09-04 18:00 UTC+3 Djibouti Eswatini 2–1 El Hadj Hassan Gouled Aptidon Stadium, Djibouti
9 2019-09-10 15:00 UTC+2 Eswatini Djibouti 0–0 Mavuso Sports Centre, Manzini
10 2019-09-07 16:00 UTC+2 Botswana Malawi 0–0 Francistown Stadium, Francistown
11 2019-09-10 14:00 UTC+2 Malawi Botswana 1–0 Kamuzu Stadium, Blantyre
12 2019-09-06 17:00 UTC±0 Gambia Angola 0–1 Independence Stadium, Bakau
13 2019-09-10 16:00 UTC+1 Angola Gambia 2–1 Estádio 11 de Novembro, Luanda
14 2019-09-04 18:00 UTC±0 Liberia Sierra Leone 3–1 Samuel Kanyon Doe Sports Complex, Paynesville
15 2019-09-08 16:30 UTC±0 Sierra Leone Liberia 1–0 Siaka Stevens Stadium, Freetown
16 2019-09-04 18:30 UTC+4 Mauritius Mozambique 0–1 Stade Anjalay, Belle Vue
17 2019-09-10 16:00 UTC+2 Mozambique Mauritius 2–0 Estádio do Zimpeto, Maputo
18 2019-09-04 15:30 UTC±0 São Tomé and Príncipe Guinea-Bissau 0–1 Estádio Nacional 12 de Julho, São Tomé
19 2019-09-10 16:30 UTC±0 Guinea-Bissau São Tomé and Príncipe 2–1 Estádio 24 de Setembro, Bissau
20 2019-09-04 16:00 UTC+2 South Sudan Equatorial Guinea 1–1 Al-Hilal Stadium, Omdurman (Sudan)[note 2]
21 2019-09-08 17:00 UTC+1 Equatorial Guinea South Sudan 1–0 Nuevo Estadio de Malabo, Malabo
22 2019-09-06 15:00 UTC+3 Comoros Togo 1–1 Stade de Moroni, Moroni
23 2019-09-10 16:00 UTC±0 Togo Comoros 2–0 Stade de Kégué, Lomé
24 2019-09-05 15:30 UTC+1 Chad Sudan 1–3 Stade Omnisports Idriss Mahamat Ouya, N'Djamena
25 2019-09-10 19:00 UTC+2 Sudan Chad 0–0 Al-Merrikh Stadium, Omdurman
26 2019-09-05 16:00 UTC+4 Seychelles Rwanda 0–3 Stade Linité, Victoria
27 2019-09-10 18:00 UTC+2 Rwanda Seychelles 7–0 Stade Régional Nyamirambo, Kigali
我是编码的新手,我已经为此工作了大约一个星期,但遇到了死胡同,所以请保持温柔。
我要做的是以打印语句显示的格式从 url 中获取所有数据,并将其放入 CSV 文件中。
我已经设法成功地打印了一行,但我不知道如何让它遍历所有其他行并将它们附加到 CSV 文件中。有什么提示或技巧吗?
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')
import urllib.request
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span', attrs={"itemprop" : "name address"})
date = dateLists[0].text.strip()
time = timeLists[0].text.strip()
homeTeam = homeTeamLists[0].text.strip()
awayTeam = awayTeamLists[0].text.strip()
score = scoreLists[0].text.strip()
venue = venueLists[0].text.strip()
print(date, time, homeTeam, score, awayTeam, venue)
您只需遍历列表中的每个项目。您可以使用 enumerate
获取索引位置,然后使用它将每个项目附加到列表和数据帧中:
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span', attrs={"itemprop" : "name address"})
dateList = []
timeList = []
homeTeamList = []
awayTeamList = []
scoreList = []
venueList = []
for idx, v in enumerate(dateLists):
dateList.append(dateLists[idx].text.strip())
timeList.append(timeLists[idx].text.strip())
homeTeamList.append(homeTeamLists[idx].text.strip())
awayTeamList.append(awayTeamLists[idx].text.strip())
scoreList.append(scoreLists[idx].text.strip())
venueList.append(venueLists[idx].text.strip())
df = pd.DataFrame({'date':dateList,
'time':timeList,
'home':homeTeamList,
'away':awayTeamList,
'score':scoreList,
'venue':venueList})
输出:
print(df.to_string())
date time home away score venue
0 2019-09-04 16:00 UTC+3 Ethiopia Lesotho 0–0 Bahir Dar Stadium, Bahir Dar
1 2019-09-08 15:00 UTC+2 Lesotho Ethiopia 1–1 Setsoto Stadium, Maseru
2 2019-09-05 18:00 UTC+3 Somalia Zimbabwe 1–0 El Hadj Hassan Gouled Aptidon Stadium, Djibout...
3 2019-09-10 15:00 UTC+2 Zimbabwe Somalia 3–1 National Sports Stadium, Harare
4 2019-09-04 16:00 UTC+3 Eritrea Namibia 1–2 Denden Stadium, Asmara
5 2019-09-10 19:00 UTC+2 Namibia Eritrea 2–0 Sam Nujoma Stadium, Windhoek
6 2019-09-04 15:00 UTC+2 Burundi Tanzania 1–1 Prince Louis Rwagasore Stadium, Bujumbura
7 2019-09-08 16:00 UTC+3 Tanzania Burundi 1–1 (a.e.t.) National Stadium, Dar es Salaam
8 2019-09-04 18:00 UTC+3 Djibouti Eswatini 2–1 El Hadj Hassan Gouled Aptidon Stadium, Djibouti
9 2019-09-10 15:00 UTC+2 Eswatini Djibouti 0–0 Mavuso Sports Centre, Manzini
10 2019-09-07 16:00 UTC+2 Botswana Malawi 0–0 Francistown Stadium, Francistown
11 2019-09-10 14:00 UTC+2 Malawi Botswana 1–0 Kamuzu Stadium, Blantyre
12 2019-09-06 17:00 UTC±0 Gambia Angola 0–1 Independence Stadium, Bakau
13 2019-09-10 16:00 UTC+1 Angola Gambia 2–1 Estádio 11 de Novembro, Luanda
14 2019-09-04 18:00 UTC±0 Liberia Sierra Leone 3–1 Samuel Kanyon Doe Sports Complex, Paynesville
15 2019-09-08 16:30 UTC±0 Sierra Leone Liberia 1–0 Siaka Stevens Stadium, Freetown
16 2019-09-04 18:30 UTC+4 Mauritius Mozambique 0–1 Stade Anjalay, Belle Vue
17 2019-09-10 16:00 UTC+2 Mozambique Mauritius 2–0 Estádio do Zimpeto, Maputo
18 2019-09-04 15:30 UTC±0 São Tomé and Príncipe Guinea-Bissau 0–1 Estádio Nacional 12 de Julho, São Tomé
19 2019-09-10 16:30 UTC±0 Guinea-Bissau São Tomé and Príncipe 2–1 Estádio 24 de Setembro, Bissau
20 2019-09-04 16:00 UTC+2 South Sudan Equatorial Guinea 1–1 Al-Hilal Stadium, Omdurman (Sudan)[note 2]
21 2019-09-08 17:00 UTC+1 Equatorial Guinea South Sudan 1–0 Nuevo Estadio de Malabo, Malabo
22 2019-09-06 15:00 UTC+3 Comoros Togo 1–1 Stade de Moroni, Moroni
23 2019-09-10 16:00 UTC±0 Togo Comoros 2–0 Stade de Kégué, Lomé
24 2019-09-05 15:30 UTC+1 Chad Sudan 1–3 Stade Omnisports Idriss Mahamat Ouya, N'Djamena
25 2019-09-10 19:00 UTC+2 Sudan Chad 0–0 Al-Merrikh Stadium, Omdurman
26 2019-09-05 16:00 UTC+4 Seychelles Rwanda 0–3 Stade Linité, Victoria
27 2019-09-10 18:00 UTC+2 Rwanda Seychelles 7–0 Stade Régional Nyamirambo, Kigali