AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping

Question

我正在按照本教程进行操作，但我还是收到了这个错误，尽管我所做的一切都是正确的。这是教程 link https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s 下面是我的代码

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find(id="productTitle").get_text()

print(title.strip())

这是我运行代码

时收到的错误信息

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

Answer 1

要从该页面获取产品标题，您只需将解析器从 html.parser 更改为 html5lib 或 lxml。后两者能够修复一些拙劣的 html 元素，在这种情况下，这些元素不允许您解析标题。我还在脚本中实现了随机用户代理以使其健壮。

工作代码：

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

ua = UserAgent()

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)

AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping

AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping

python

attributeerror

web-scraping

python-3.x

web