网络爬虫代码上的 NoneType

Question

我正在尝试构建一个简单的网络抓取工具（这里是 python 新程序员 - 请原谅简单的问题）。
这是我的代码：

import urllib2
from bs4 import BeautifulSoup

comments_url = 'https://somewebsite.com'

comments_page = urllib2.urlopen(comments_url)

raw_data = BeautifulSoup(comments_page, 'html.parser')
data = raw_data.find('tr',attrs={'data-ix-row': 'data-ix-bug'})
print(type(data))`

作为参考，这里是类我试图从网页中解析出来的内容： html_grab_reference page

当我运行此代码时，出现以下错误：

<type 'NoneType'>

我在查询数据的某个地方犯了一个错误（我认为），它没有返回任何内容。
关于我做错了什么的任何想法？

Answer 1

你得到 None 因为没有找到。你的属性没有分配给对方。您有一个 class 和一个常规属性。

此外，该属性看起来像生成的数字（可能由 Javascript 生成，因此 urllib2 不会将其呈现给 HTML）

如果它确实有效，你至少需要这个

attrs={'class': 'data-ix-row'}

然后过滤掉 data-ix-bug 以获得您感兴趣的任何值。（您可能想要 find_all）

网络爬虫代码上的 NoneType

NoneType on webscraper code

python

beautifulsoup

python-3.x

nonetype