Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?

Question

我一直在寻找这个 AttributeError 的解决方案，我一直在寻找解决方案，但我找不到解决方案“_all_strings”。

我想编写一个网络爬虫代码，但页面顶部和底部有很多废话，所以我正在尝试清理 HTML 代码作为排除网页顶部和底部出现不需要的噪音。

当我运行下面的代码时，特别是它的最后一行，我得到一个 AttributeError:

from __future__ import division
from urllib.request import urlopen
from bs4 import BeautifulSoup

textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
raw = BeautifulSoup.get_text(html)

这是我得到的完整回溯：

Traceback (most recent call last):
  File "...Crawler_Celt_Namelink_Test.py", line 7, in <module>
    raw = BeautifulSoup.get_text(html)
  File "...Python\Python35\lib\site-packages\bs4\element.py", line 950, in get_text
    return separator.join([s for s in self._all_strings(
AttributeError: 'bytes' object has no attribute '_all_strings'

有没有人遇到过这个错误？或者有人可以建议我如何克服它吗？

Answer 1

当您查看 BeautifulSoup docs 时，它是这样使用的：

from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()

soup = BeautifulSoup(html, 'html.parser')

raw = BeautifulSoup.get_text(soup)

Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?

Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?

html

python

beautifulsoup

web-crawler

attributeerror