python 中最快的 DOM 解析器是哪个? Scrapy 的内置选择器或 lxml?或者其他一些解析器
Which is the fastest DOM parser in python? Scrapy's built in selectors or lxml? Or some other parser
我已经在 10-15 个项目中使用了 scrapy,并用 scrapy 尝试了 scrapy 的解析器和 lxml 解析器。
我想找出哪个是可以在python中使用的最好的解析器(在解析速度方面)。
我试图通过测试它们来比较它们的性能,以在电子商务网站上为某个类别抓取产品名称。但无法弄清楚速度。
1.I 在 scrapy 中使用 lxml 进行解析
2. 我使用了 scrapy 内置的解析器
这就是 lxml 库的 moto 所说的
"Programming with libxml2 is like the thrilling embrace of an exotic stranger. It seems to have the potential to fulfill your wildest dreams, but there's a nagging voice somewhere in your head warning you that you're about to get screwed in the worst way." (a quote by Mark Pilgrim)
Mark Pilgrim was describing, in particular, the experience a Python programmer has when dealing with libxml2. The default Python bindings of libxml2 are fast, thrilling, powerful, and your code might fail in some horrible way that you really shouldn't have to worry about when writing Python code.
lxml combines the power of libxml2 with the ease of use of Python.
众所周知,c 中的 Lib2xml 是最快的解析器。
结果 scrapy 使用了 parsel 库,它是 lxml 的包装器。
来自 scrapy documentation
Scrapy Selectors is a thin wrapper around parsel library; the purpose of this wrapper is to provide better integration with Scrapy Response objects.
parsel is a stand-alone web scraping library which can be used without Scrapy. It uses lxml library under the hood, and implements an easy API on top of lxml API. It means Scrapy selectors are very similar in speed and parsing accuracy to lxml.
所以我想我应该使用 scrapy 的选择器,因为它会更容易。
我已经在 10-15 个项目中使用了 scrapy,并用 scrapy 尝试了 scrapy 的解析器和 lxml 解析器。
我想找出哪个是可以在python中使用的最好的解析器(在解析速度方面)。
我试图通过测试它们来比较它们的性能,以在电子商务网站上为某个类别抓取产品名称。但无法弄清楚速度。 1.I 在 scrapy 中使用 lxml 进行解析 2. 我使用了 scrapy 内置的解析器
这就是 lxml 库的 moto 所说的
"Programming with libxml2 is like the thrilling embrace of an exotic stranger. It seems to have the potential to fulfill your wildest dreams, but there's a nagging voice somewhere in your head warning you that you're about to get screwed in the worst way." (a quote by Mark Pilgrim) Mark Pilgrim was describing, in particular, the experience a Python programmer has when dealing with libxml2. The default Python bindings of libxml2 are fast, thrilling, powerful, and your code might fail in some horrible way that you really shouldn't have to worry about when writing Python code. lxml combines the power of libxml2 with the ease of use of Python.
众所周知,c 中的 Lib2xml 是最快的解析器。
结果 scrapy 使用了 parsel 库,它是 lxml 的包装器。
来自 scrapy documentation
Scrapy Selectors is a thin wrapper around parsel library; the purpose of this wrapper is to provide better integration with Scrapy Response objects.
parsel is a stand-alone web scraping library which can be used without Scrapy. It uses lxml library under the hood, and implements an easy API on top of lxml API. It means Scrapy selectors are very similar in speed and parsing accuracy to lxml.
所以我想我应该使用 scrapy 的选择器,因为它会更容易。