AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping
AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping
我正在按照本教程进行操作,但我还是收到了这个错误,尽管我所做的一切都是正确的。这是教程 link https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s 下面是我的代码
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title.strip())
这是我运行代码
时收到的错误信息
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
要从该页面获取产品标题,您只需将解析器从 html.parser
更改为 html5lib
或 lxml
。后两者能够修复一些拙劣的 html 元素,在这种情况下,这些元素不允许您解析标题。我还在脚本中实现了随机用户代理以使其健壮。
工作代码:
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)
我正在按照本教程进行操作,但我还是收到了这个错误,尽管我所做的一切都是正确的。这是教程 link https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s 下面是我的代码
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title.strip())
这是我运行代码
时收到的错误信息Traceback (most recent call last):
File "scraper.py", line 26, in <module>
title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
要从该页面获取产品标题,您只需将解析器从 html.parser
更改为 html5lib
或 lxml
。后两者能够修复一些拙劣的 html 元素,在这种情况下,这些元素不允许您解析标题。我还在脚本中实现了随机用户代理以使其健壮。
工作代码:
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)