'BeautifulSoup'和'lxml'有什么关系？

Question

在 lxml 的 doc 中，它说：

lxml can interface to the parsing capabilities of BeautifulSoup through the lxml.html.soupparser module. It provides three main functions: fromstring() and parse() to parse a string or file using BeautifulSoup into an lxml.html document, and convert_tree() to convert an existing BeautifulSoup tree into a list of top-level Elements.

同时，BS'也可以使用lxml作为解析器。[ref]

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser.

BS 还建议使用 lxml 作为速度解析器。

如果 lxml 使用 BS 进行解析，而 BS 的解析器反过来是 lxml 呢？

我一直绞尽脑汁想弄清楚他们的关系。帮助。

Answer 1

关于 BS 解析器和 lxml.html 解析器应该没有什么混淆的地方。 BS 有一个 HTML 解析器，lxml 有自己的 HTML 解析器。

BS 您引用的文档只是说您可以使用 lxml 解析器或其他可能的第三方解析器将 HTML 解析为 BS 汤对象，作为替代使用默认的 BS 解析器：

BeautifulSoup(markup, "lxml")

同样，lxml 文档说您可以使用 BS 解析器将 HTML 解析为 lxml 树对象，作为使用默认 lxml.html解析器：

root = lxml.html.soupparser.fromstring(tag_soup)

'BeautifulSoup'和'lxml'有什么关系？

What's the relationship between 'BeautifulSoup' and 'lxml'?

python

lxml

beautifulsoup

html-parsing