beautifulsoup 查找函数返回 none

beautifulsoup find function returning none

我正在尝试使用 beautifulsoup 删除一些信息。 但是,高度不断返回 None。你能看看可能是什么问题吗?

import requests
from bs4 import BeautifulSoup

header = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'}
url = 'https://www.akc.org/dog-breeds/Affenpinscher/'
r = requests.get(url, headers= header)
soup = BeautifulSoup(r.content, 'html.parser')
height = soup.find('div', class_ = "f-16 my0 lh-solid breed-page__hero__overview__subtitle")
print(height)

enter image description here

会发生什么?

内容是动态提供的 - 始终查看您的汤并检查是否存在您想要查找的数据。结果可能与通过开发人员工具检查的结果略有不同,因为 requests 不会像浏览器那样呈现网站和 makr 调整。

因此您的 soup.find('div', class_ = "f-16 my0 lh-solid breed-page__hero__overview__subtitle") 将找不到任何结果并导致 None

如何修复?

信息存储在脚本标签中 - 您可以使用 BeautifulSoup 查找标签并使用 json.loads() 访问值。

例子

import requests,json
from bs4 import BeautifulSoup

header = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'}
url = 'https://www.akc.org/dog-breeds/Affenpinscher/'
r = requests.get(url, headers= header)
soup = BeautifulSoup(r.content, 'html.parser')

json.loads(soup.select('[type="application/ld+json"]')[2].text)

输出

{'@context': ['http://schema.org', {'csvw': 'http://www.w3.org/ns/csvw#'}], '@type': 'Dataset', 'name': 'Affenpinscher', 'description': 'The Affenpinscher: loyal, curious, and famously amusing; this almost-human toy dog is fearless out of all proportion to his size. As with all great comedians, it’s the Affenpinscher’s apparent seriousness of purpose that makes his antics all the more amusing.', 'url': 'https://www.akc.org/dog-breeds/affenpinscher/', 'sameAs': 'https://images.akc.org/pdf/breeds/standards/Affenpinscher.pdf', 'mainEntity': {'@type': 'csvw:Table', 'csvw:tableSchema': {'csvw:columns': [{'csvw:name': 'Height', 'csvw:datatype': 'string', 'csvw:cells': [{'csvw:value': 'Height: 9-11.5 inches'}]}, {'csvw:name': 'Weight', 'csvw:datatype': 'string', 'csvw:cells': [{'csvw:value': 'Weight: 7-10 pounds'}]}, {'csvw:name': 'Life Expectancy', 'csvw:datatype': 'string', 'csvw:cells': [{'csvw:value': 'Life Expectancy: 12-15 years'}]}, {'csvw:name': 'Group', 'csvw:datatype': 'string', 'csvw:cells': [{'csvw:value': 'Group: Toy Group'}]}]}}}

像这样访问 table 值:

table = json.loads(soup.select('[type="application/ld+json"]')[2].text)['mainEntity']['csvw:tableSchema']['csvw:columns']

for r in table:
    print(r['csvw:cells'][0]['csvw:value'])

#output
Height: 9-11.5 inches
Weight: 7-10 pounds
Life Expectancy: 12-15 years
Group: Toy Group