Python、beautifulsoup 从统计数据中抓取特定或准确的数字 table
Python, beautifulsoup scraping specific or exact numbers from a stat table
如何将我的锚点设为“2014”并在 2014 列中抓取特定数字(刮掉 2014 右侧的数字)
下面的代码跳过了 "Passing" table(包含所有职业通过统计数据)并尝试从 "Rushing" table(包含所有职业生涯冲刺统计数据),以“2014”为锚,并在“2014”之后抓取接下来的五个
标签(试图抓取 2014 年右侧的数字)。
我相信我的代码很接近,但我收到一条错误消息。
from bs4 import BeautifulSoup
import nltk
from urllib import urlopen
import urllib
import re
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
for row in table.findAll("tr", { "td" : "2014" }):
cells = row.findAll("td")
print cells[1]
print cells[2]
print cells[3]
print cells[4]
print cells[5]
这是错误信息:
Traceback (most recent call last):
File "C:\Users\jcmcdonald\Desktop\test7.py", line 17, in
print cells1
NameError: name 'cells' is not defined
第 17 行将是第一个 "print cells1"
我认为这更符合您的要求。你不能像你想做的那样过滤年份,你必须有一个 if 语句并自己过滤掉它。
from bs4 import BeautifulSoup
from urllib import urlopen
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
rows = []
for row in table.findAll("tr"):
if '2014' in row.findNext('td'):
for item in row.findAll("td"):
rows.append(item.text)
print rows[6]
这应该可以帮助您入门:
from bs4 import BeautifulSoup
import nltk
from urllib import urlopen
import urllib
import re
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
for row in table.find_all('tr'):
print row.text
如何将我的锚点设为“2014”并在 2014 列中抓取特定数字(刮掉 2014 右侧的数字)
下面的代码跳过了 "Passing" table(包含所有职业通过统计数据)并尝试从 "Rushing" table(包含所有职业生涯冲刺统计数据),以“2014”为锚,并在“2014”之后抓取接下来的五个
我相信我的代码很接近,但我收到一条错误消息。
from bs4 import BeautifulSoup
import nltk
from urllib import urlopen
import urllib
import re
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
for row in table.findAll("tr", { "td" : "2014" }):
cells = row.findAll("td")
print cells[1]
print cells[2]
print cells[3]
print cells[4]
print cells[5]
这是错误信息:
Traceback (most recent call last):
File "C:\Users\jcmcdonald\Desktop\test7.py", line 17, in
print cells1
NameError: name 'cells' is not defined
第 17 行将是第一个 "print cells1"
我认为这更符合您的要求。你不能像你想做的那样过滤年份,你必须有一个 if 语句并自己过滤掉它。
from bs4 import BeautifulSoup
from urllib import urlopen
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
rows = []
for row in table.findAll("tr"):
if '2014' in row.findNext('td'):
for item in row.findAll("td"):
rows.append(item.text)
print rows[6]
这应该可以帮助您入门:
from bs4 import BeautifulSoup
import nltk
from urllib import urlopen
import urllib
import re
url = 'http://www.nfl.com/player/tombrady/2504211/careerstats'
html = urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find("table", { "summary" : "Career Stats In Rushing For Tom Brady" })
for row in table.find_all('tr'):
print row.text