python urllib 出现错误

Question

我想从网页中获取所有文本

import bs4 as bs
import urllib.request
source=urllib.request.urlopen('https://google.com')
soup=bs.BeautifulSoup(source,'lxml')
print(soup.get_text())

这是我的错误：

回溯（最后一次调用）：文件 "D:/test/web.py"，第 5 行，位于汤=bs.BeautifulSoup（来源，'lxml'）文件 "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4__init__.py"，第 165 行，在 init 中 % ",".join(特征)) bs4.FeatureNotFound：找不到具有您请求的功能的树生成器：lxml。您需要安装解析器库吗？

我试过类似的代码，但会出现同样的错误....这是什么原因？

Answer 1

urlopen return "http.client.HTTPResponse" 对象不是内容。如果你想获取内容，你必须使用 read 方法。例如：

source = urllib.request.urlopen('https://google.com').read()

您收到了这条求助信息："Do you need to install a parser library (lxml)?"。只需安装此软件包：pip install lxml

python urllib 出现错误

python urllib is getting error

windows

urllib

python-3.x