Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?
Why am I getting this (apparently) unusual AttributeError: 'bytes' object has no attribute '_all_strings'? Is there a way to get around it?
我一直在寻找这个 AttributeError 的解决方案,我一直在寻找解决方案,但我找不到解决方案“_all_strings”。
我想编写一个网络爬虫代码,但页面顶部和底部有很多废话,所以我正在尝试清理 HTML 代码作为排除网页顶部和底部出现不需要的噪音。
当我 运行 下面的代码时,特别是它的最后一行,我得到一个 AttributeError:
from __future__ import division
from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
raw = BeautifulSoup.get_text(html)
这是我得到的完整回溯:
Traceback (most recent call last):
File "...Crawler_Celt_Namelink_Test.py", line 7, in <module>
raw = BeautifulSoup.get_text(html)
File "...Python\Python35\lib\site-packages\bs4\element.py", line 950, in get_text
return separator.join([s for s in self._all_strings(
AttributeError: 'bytes' object has no attribute '_all_strings'
有没有人遇到过这个错误?或者有人可以建议我如何克服它吗?
当您查看 BeautifulSoup docs 时,它是这样使用的:
from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
soup = BeautifulSoup(html, 'html.parser')
raw = BeautifulSoup.get_text(soup)
我一直在寻找这个 AttributeError 的解决方案,我一直在寻找解决方案,但我找不到解决方案“_all_strings”。
我想编写一个网络爬虫代码,但页面顶部和底部有很多废话,所以我正在尝试清理 HTML 代码作为排除网页顶部和底部出现不需要的噪音。
当我 运行 下面的代码时,特别是它的最后一行,我得到一个 AttributeError:
from __future__ import division
from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
raw = BeautifulSoup.get_text(html)
这是我得到的完整回溯:
Traceback (most recent call last):
File "...Crawler_Celt_Namelink_Test.py", line 7, in <module>
raw = BeautifulSoup.get_text(html)
File "...Python\Python35\lib\site-packages\bs4\element.py", line 950, in get_text
return separator.join([s for s in self._all_strings(
AttributeError: 'bytes' object has no attribute '_all_strings'
有没有人遇到过这个错误?或者有人可以建议我如何克服它吗?
当您查看 BeautifulSoup docs 时,它是这样使用的:
from urllib.request import urlopen
from bs4 import BeautifulSoup
textSource = 'http://celt.ucc.ie/irlpage.html'
html = urlopen(textSource).read()
soup = BeautifulSoup(html, 'html.parser')
raw = BeautifulSoup.get_text(soup)