获取 div 属性值和 div 文本正文
get div attribute val and div text body
这是获取 div 属性值的小代码。所有 div 名称都相同,具有相同的属性名称。
redditFile = urllib2.urlopen("http://www.bing.com/videos?q=owl")
redditHtml = redditFile.read()
redditFile.close()
soup = BeautifulSoup(redditHtml)
productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
for div in productDivs:
print div.find('div', {"class":"vthumb"})['smturl']
#print div.find("div", {"class":"tl text-body"}) This print none rather then div text
首先打印给出一些 url(有时是 4、6、8 等)然后
KeyError Traceback (most recent call last)
<ipython-input-34-cc950a8a84f7> in <module>()
26 productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
27 for div in productDivs:
---> 28 print div.find('div', {"class":"vthumb"})['smturl']
29 print div.find("div", {"class":"tl text-body"})
/usr/local/lib/python2.7/dist-packages/bs4/element.pyc in __getitem__(self, key)
903 """tag[key] returns the value of the 'key' attribute for the tag,
904 and throws an exception if it's not there."""
--> 905 return self.attrs[key]
906
907 def __iter__(self):
KeyError: 'smturl'
因为所有 div 名称都与相同的 smturl
attr 名称相同,为什么它给 KeyError
任何帮助?
并不是所有的div都有smturl
属性,所以你需要在find
调用中添加该属性。此外,并非 productDivs
中的所有元素都包含您要查找的 div,因此我添加了 test if find
returns None.
In [27]: for div in productDivs:
....: if div.find('div', {"class":"vthumb", 'smturl': True}) is not None:
....: print div.find('div', {"class":"vthumb", 'smturl': True})['smturl']
....:
http://ts2.mm.bing.net/th?id=OMB.9hfZ6cCDfUWbpw&pid=2.1
http://ts4.mm.bing.net/th?id=OMB1.n%2b12M8SoyFcsag&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.ev1wnIiszGjhUg&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.hDLa5PO07Chclw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB.xDT9H25QFJ2jBw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB.BULQolkxkaZ0uw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB.xp3c0DyKrfmB7Q&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.MxP9fUyaJCRyhw&pid=2.1
http://ts4.mm.bing.net/th?id=OMB2.CWjPPKiJQc4z6w&pid=2.1
http://ts1.mm.bing.net/th?id=OMB1.ZVKhvML3%2bPzM1w&pid=2.1
http://ts1.mm.bing.net/th?id=OMB.SLn%2b0NwKeUdZXw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB.4HJqrT9pBevGlg&pid=2.1
http://ts2.mm.bing.net/th?id=OMB2.HgWYR9sjPw6JlQ&pid=2.1
http://ts1.mm.bing.net/th?id=OMB.RyBXWQ9sH9wThw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB2.Vf21EgXRXMcdfg&pid=2.1
http://ts3.mm.bing.net/th?id=OMB2.BIb6qwbHniC1vw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB1.H9bwRYncKU380A&pid=2.1
http://ts2.mm.bing.net/th?id=OM1.mBXeu55OD4VimQ&pid=2.1
这是获取 div 属性值的小代码。所有 div 名称都相同,具有相同的属性名称。
redditFile = urllib2.urlopen("http://www.bing.com/videos?q=owl")
redditHtml = redditFile.read()
redditFile.close()
soup = BeautifulSoup(redditHtml)
productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
for div in productDivs:
print div.find('div', {"class":"vthumb"})['smturl']
#print div.find("div", {"class":"tl text-body"}) This print none rather then div text
首先打印给出一些 url(有时是 4、6、8 等)然后
KeyError Traceback (most recent call last)
<ipython-input-34-cc950a8a84f7> in <module>()
26 productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
27 for div in productDivs:
---> 28 print div.find('div', {"class":"vthumb"})['smturl']
29 print div.find("div", {"class":"tl text-body"})
/usr/local/lib/python2.7/dist-packages/bs4/element.pyc in __getitem__(self, key)
903 """tag[key] returns the value of the 'key' attribute for the tag,
904 and throws an exception if it's not there."""
--> 905 return self.attrs[key]
906
907 def __iter__(self):
KeyError: 'smturl'
因为所有 div 名称都与相同的 smturl
attr 名称相同,为什么它给 KeyError
任何帮助?
并不是所有的div都有smturl
属性,所以你需要在find
调用中添加该属性。此外,并非 productDivs
中的所有元素都包含您要查找的 div,因此我添加了 test if find
returns None.
In [27]: for div in productDivs:
....: if div.find('div', {"class":"vthumb", 'smturl': True}) is not None:
....: print div.find('div', {"class":"vthumb", 'smturl': True})['smturl']
....:
http://ts2.mm.bing.net/th?id=OMB.9hfZ6cCDfUWbpw&pid=2.1
http://ts4.mm.bing.net/th?id=OMB1.n%2b12M8SoyFcsag&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.ev1wnIiszGjhUg&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.hDLa5PO07Chclw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB.xDT9H25QFJ2jBw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB.BULQolkxkaZ0uw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB.xp3c0DyKrfmB7Q&pid=2.1
http://ts4.mm.bing.net/th?id=OMB.MxP9fUyaJCRyhw&pid=2.1
http://ts4.mm.bing.net/th?id=OMB2.CWjPPKiJQc4z6w&pid=2.1
http://ts1.mm.bing.net/th?id=OMB1.ZVKhvML3%2bPzM1w&pid=2.1
http://ts1.mm.bing.net/th?id=OMB.SLn%2b0NwKeUdZXw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB.4HJqrT9pBevGlg&pid=2.1
http://ts2.mm.bing.net/th?id=OMB2.HgWYR9sjPw6JlQ&pid=2.1
http://ts1.mm.bing.net/th?id=OMB.RyBXWQ9sH9wThw&pid=2.1
http://ts2.mm.bing.net/th?id=OMB2.Vf21EgXRXMcdfg&pid=2.1
http://ts3.mm.bing.net/th?id=OMB2.BIb6qwbHniC1vw&pid=2.1
http://ts3.mm.bing.net/th?id=OMB1.H9bwRYncKU380A&pid=2.1
http://ts2.mm.bing.net/th?id=OM1.mBXeu55OD4VimQ&pid=2.1