Python 3 'NoneType' 对象没有属性 'text'
Python 3 'NoneType' object has no attribute 'text'
# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
#specify the url
html = 'https://www.bloomberg.com/quote/SPX:IND'
# query the website and return the html to thevariable 'page'
page = urlopen(html)
# parse the html using beautiful soup and store in variable 'soup'
data = BeautifulSoup(page, 'html.parser')
#take out the <div> of name and get its value
name_box = data.find('h1', attrs={'class': 'companyName_99a4824b'})
name = name_box.text.strip() #strip is used to remove starting and trailing
print (name)
# get the index price
price_box = data.find('div', attrs={'class':'priceText_1853e8a5'})
price = price_box.text
print (price)
我一直在遵循 medium.com here 上的指南,但由于缺乏 python 和脚本知识,我遇到了一些冲突,但我认为我的错误是
姓名=name_box.text
因为文本未定义,我不确定他们是否希望我使用 BeautifulSoup 库来定义它。任何帮助可能会受到赞赏。实际误差会在下面
RESTART: C:/Users/Parsons PC/AppData/Local/Programs/Python/Python36-32/projects/Scripts/S&P 500 website scraper/main.py
Traceback (most recent call last):
File "C:/Users/Parsons PC/AppData/Local/Programs/Python/Python36-32/projects/Scripts/S&P 500 website scraper/main.py", line 17, in <module>
name = name_box.text.strip() #strip is used to remove starting and trailing
AttributeError: 'NoneType' object has no attribute 'text'
网站 https://www.bloomberg.com/quote/SPX:IND 不包含 class 名称 companyName_99a4824b
的 <h1>
标签。这就是您收到上述错误的原因。
在网站上。 <h1>
标签看起来像这样,
<h1 class="companyName__99a4824b">S&P 500 Index</h1>
所以要 select 它,你必须将 class 名称更改为 companyName__99a4824b
。
name_box = data.find('h1', attrs={'class': 'companyName__99a4824b'})
最终结果:
# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
#specify the url
html = 'https://www.bloomberg.com/quote/SPX:IND'
# query the website and return the html to thevariable 'page'
page = urlopen(html)
# parse the html using beautiful soup and store in variable 'soup'
data = BeautifulSoup(page, 'html.parser')
#take out the <div> of name and get its value
name_box = data.find('h1', attrs={'class': 'companyName__99a4824b'}) #edited companyName_99a4824b -> companyName__99a4824b
name = name_box.text.strip() #strip is used to remove starting and trailing
print (name)
# get the index price
price_box = data.find('div', attrs={'class':'priceText__1853e8a5'}) #edited priceText_1853e8a5 -> priceText__1853e8a5
price = price_box.text
print (price)
如果你也能处理这个异常就更好了,以备将来class更名。
# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
#specify the url
html = 'https://www.bloomberg.com/quote/SPX:IND'
# query the website and return the html to thevariable 'page'
page = urlopen(html)
# parse the html using beautiful soup and store in variable 'soup'
data = BeautifulSoup(page, 'html.parser')
#take out the <div> of name and get its value
name_box = data.find('h1', attrs={'class': 'companyName_99a4824b'})
name = name_box.text.strip() #strip is used to remove starting and trailing
print (name)
# get the index price
price_box = data.find('div', attrs={'class':'priceText_1853e8a5'})
price = price_box.text
print (price)
我一直在遵循 medium.com here 上的指南,但由于缺乏 python 和脚本知识,我遇到了一些冲突,但我认为我的错误是
姓名=name_box.text
因为文本未定义,我不确定他们是否希望我使用 BeautifulSoup 库来定义它。任何帮助可能会受到赞赏。实际误差会在下面
RESTART: C:/Users/Parsons PC/AppData/Local/Programs/Python/Python36-32/projects/Scripts/S&P 500 website scraper/main.py
Traceback (most recent call last):
File "C:/Users/Parsons PC/AppData/Local/Programs/Python/Python36-32/projects/Scripts/S&P 500 website scraper/main.py", line 17, in <module>
name = name_box.text.strip() #strip is used to remove starting and trailing
AttributeError: 'NoneType' object has no attribute 'text'
网站 https://www.bloomberg.com/quote/SPX:IND 不包含 class 名称 companyName_99a4824b
的 <h1>
标签。这就是您收到上述错误的原因。
在网站上。 <h1>
标签看起来像这样,
<h1 class="companyName__99a4824b">S&P 500 Index</h1>
所以要 select 它,你必须将 class 名称更改为 companyName__99a4824b
。
name_box = data.find('h1', attrs={'class': 'companyName__99a4824b'})
最终结果:
# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
#specify the url
html = 'https://www.bloomberg.com/quote/SPX:IND'
# query the website and return the html to thevariable 'page'
page = urlopen(html)
# parse the html using beautiful soup and store in variable 'soup'
data = BeautifulSoup(page, 'html.parser')
#take out the <div> of name and get its value
name_box = data.find('h1', attrs={'class': 'companyName__99a4824b'}) #edited companyName_99a4824b -> companyName__99a4824b
name = name_box.text.strip() #strip is used to remove starting and trailing
print (name)
# get the index price
price_box = data.find('div', attrs={'class':'priceText__1853e8a5'}) #edited priceText_1853e8a5 -> priceText__1853e8a5
price = price_box.text
print (price)
如果你也能处理这个异常就更好了,以备将来class更名。