BeautifulSoup 4 个带字段集

Question

我正在研究我遇到的一个抓取工具

BeautifulSoup("<fieldset> some html </fieldset>")

这会报错TypeError: 'NoneType' object is not callable

代码

soup = BeautifulSoup(res.content)
categories = soup.findAll('fieldset')

for category in categories:
    print category
    category = BeautifulSoup(category)

正在打印 category 我得到了

<fieldset>
<a class="box" href="http://example.com">
<img src="http://example.png" alt="" />
</a>
</fieldset>

堆栈跟踪

Traceback (most recent call last):
  File "scraper.py", line 40, in <module>
    print get_channels_list()
  File "scraper.py", line 22, in get_channels_list
    category = BeautifulSoup(category)
  File "C:\Anaconda\lib\site-packages\BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\BeautifulSoup.py", line 1143, in __init__
    markup = markup.read()
TypeError: 'NoneType' object is not callable

Answer 1

您已经有一个 BeautifulSoup 元素，不需要再将它传递给 BeautifulSoup() 。此类元素的 str() 表示会生成美化的 HTML，但您没有字符串。

只需继续使用字段集：

soup = BeautifulSoup(res.content)
categories = soup.findAll('fieldset')

for category in categories:
    # do something with the fieldset object.

我注意到您正在使用 BeautifulSoup 版本 3。您真的想升级到 BeautifulSoup 4；版本 3 已于 3 年前停产，并且包含早在 BeautifulSoup 4:

中解决的错误

from bs4 import BeautifulSoup

另见 BeautifulSoup 3 section in the BeautifulSoup 4 documentation。

BeautifulSoup 4 个带字段集

BeautifulSoup 4 with fieldset

python

beautifulsoup

fieldset

python-2.7