Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?

Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?

我一直在寻找这个 AttributeError 的解决方案,我一直在寻找解决方案,但我找不到解决方案“_all_strings”。

我想编写一个网络爬虫代码,但页面顶部和底部有很多废话,所以我正在尝试清理 HTML 代码作为排除网页顶部和底部出现不需要的噪音。

当我 运行 下面的代码时,特别是它的最后一行,我得到一个 AttributeError:

from __future__ import division
from urllib.request import urlopen
from bs4 import BeautifulSoup

textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
raw = BeautifulSoup.get_text(html)

这是我得到的完整回溯:

Traceback (most recent call last):
  File "...Crawler_Celt_Namelink_Test.py", line 7, in <module>
    raw = BeautifulSoup.get_text(html)
  File "...Python\Python35\lib\site-packages\bs4\element.py", line 950, in get_text
    return separator.join([s for s in self._all_strings(
AttributeError: 'bytes' object has no attribute '_all_strings'

有没有人遇到过这个错误?或者有人可以建议我如何克服它吗?

当您查看 BeautifulSoup docs 时,它是这样使用的:

from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()

soup = BeautifulSoup(html, 'html.parser')

raw = BeautifulSoup.get_text(soup)