Python 美汤网抓取具体数字
Python Beautiful Soup Web Scraping Specific Numbers
在 this page 每支球队的最终得分(数量)具有相同的 class 名称 class="finalScore"
。
当我调用客队的最终比分(顶部)时,代码调用该号码没有问题。如果... favLastGM = 'A'
当我尝试调用主队的最终比分(底部)时,代码给我一个错误。如果... favLastGM = 'H'
下面是我的代码:
import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen
#Last Two Game info Home [H] or Away [A]
favLastGM = 'A' #Higher week number 2
#Game Info (Favorite) Last Game Played - CBS Sports (Change Every Week)
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoHtml = urlopen(favPrevGMInfoUrl).read()
favPrevGMInfoSoup = BeautifulSoup(favPrevGMInfoHtml)
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
else:
print("***************************************************")
print("NOT A VALID ENTRY - favLastGM !")
print("***************************************************")
print ("Enter: Total Points Allowed from Favored Team Defense for last game played: "),
print favScore[0].text
这是我在 favLastGM = 'H'
时得到的错误
Traceback (most recent call last): File
"C:/Users/jcmcdonald/Desktop/FinalScoreTest.py", line 26, in
print favScore[0].text File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in
getitem
return self.attrs[key] KeyError: 0
只有两个元素class="finalScore"
,第一个是主队的比分,第二个是客队的比分:
>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>>
>>> favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
>>>
>>> favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
>>> score = [item.get_text() for item in favPrevGMInfoSoup.find_all("td", {"class": "finalScore"})]
>>> score
[u'30', u'7']
仅供参考,您可以使用 CSS selector:.select("td.finalScore")
.
而不是 .find_all("td", {"class": "finalScore"})
在您的代码中,您将不同类型的对象分配给 favScore
。所以在第一种情况下,你有:
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
你最终得到了一个列表...
faveScore = [<td class="finalScore">30</td>, <td class="finalScore">7</td>]
而在第二种情况下,您有:
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
您最终得到一个 BeautfulSoup 元素...
favScore = <td class="finalScore">7</td>
您可以通过以下方式解决此问题(注意 [0]
):
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[0]
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
最后做:
print favScore.text
我稍微扩展了@alecxe 的答案,明确选择了 home 和 away 团队(而不是依赖隐式排序数组):
from urllib import urlopen
from bs4 import BeautifulSoup
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
home_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo homeTeam"}).find("td", {"class": "finalScore"}).get_text()
away_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo awayTeam"}).find("td", {"class": "finalScore"}).get_text()
print home_score, away_score
在 this page 每支球队的最终得分(数量)具有相同的 class 名称 class="finalScore"
。
当我调用客队的最终比分(顶部)时,代码调用该号码没有问题。如果... favLastGM = 'A'
当我尝试调用主队的最终比分(底部)时,代码给我一个错误。如果... favLastGM = 'H'
下面是我的代码:
import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen
#Last Two Game info Home [H] or Away [A]
favLastGM = 'A' #Higher week number 2
#Game Info (Favorite) Last Game Played - CBS Sports (Change Every Week)
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoHtml = urlopen(favPrevGMInfoUrl).read()
favPrevGMInfoSoup = BeautifulSoup(favPrevGMInfoHtml)
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
else:
print("***************************************************")
print("NOT A VALID ENTRY - favLastGM !")
print("***************************************************")
print ("Enter: Total Points Allowed from Favored Team Defense for last game played: "),
print favScore[0].text
这是我在 favLastGM = 'H'
时得到的错误Traceback (most recent call last): File "C:/Users/jcmcdonald/Desktop/FinalScoreTest.py", line 26, in print favScore[0].text File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in getitem return self.attrs[key] KeyError: 0
只有两个元素class="finalScore"
,第一个是主队的比分,第二个是客队的比分:
>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>>
>>> favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
>>>
>>> favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
>>> score = [item.get_text() for item in favPrevGMInfoSoup.find_all("td", {"class": "finalScore"})]
>>> score
[u'30', u'7']
仅供参考,您可以使用 CSS selector:.select("td.finalScore")
.
.find_all("td", {"class": "finalScore"})
在您的代码中,您将不同类型的对象分配给 favScore
。所以在第一种情况下,你有:
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
你最终得到了一个列表...
faveScore = [<td class="finalScore">30</td>, <td class="finalScore">7</td>]
而在第二种情况下,您有:
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
您最终得到一个 BeautfulSoup 元素...
favScore = <td class="finalScore">7</td>
您可以通过以下方式解决此问题(注意 [0]
):
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[0]
elif favLastGM == 'H':
favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
最后做:
print favScore.text
我稍微扩展了@alecxe 的答案,明确选择了 home 和 away 团队(而不是依赖隐式排序数组):
from urllib import urlopen
from bs4 import BeautifulSoup
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
home_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo homeTeam"}).find("td", {"class": "finalScore"}).get_text()
away_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo awayTeam"}).find("td", {"class": "finalScore"}).get_text()
print home_score, away_score