如何使用 Scrapy 从这个元素中获取文本? :: 文本不起作用
How to get the text from this element with Scrapy ? :: text is not working
对于元素中的文本,我们可以使用哪些其他方法?
>>> products.css('h2.entry-title').get()
'<h2 class="entry-title" itemprop="headline"><a href="https://example.com/index.php/2021/12/12/your-20-with-few-clicks-from-stash/" rel="bookmark">Your With Few Clicks From Stash</a></h2>'
但试图获取文本,你的 20 美元来自 Stash 使用
products.css('h2.entry-title::text').get()
>>> products.css('h2.entry-title::text').get()
>>>
无效。有什么建议吗?谢谢。
实际上,所需的文本节点 Your With Few Clicks From Stash
在 a tag
下。要获得正确的输出,css 表达式应如下所示:
products.css('h2.entry-title a::text').get().strip()
在 scrapy 中的实现 shell:
In [6]: from scrapy.selector import Selector
In [7]: %paste
html_doc="""
<html>
<body>
<h2 class="entry-title" itemprop="headline">
<a href="https://example.com/index.php/2021/12/12/your-20-with-few-clicks-from-stash/" rel="bookmark">
Your With Few Clicks From Stash
</a>
</h2>
</body>
</html>
"""
## -- End pasted text --
In [8]: sel = Selector(text=html_doc)
In [9]: sel.css('h2.entry-title a::text').get().strip()
Out[9]: 'Your With Few Clicks From Stash'
对于元素中的文本,我们可以使用哪些其他方法?
>>> products.css('h2.entry-title').get()
'<h2 class="entry-title" itemprop="headline"><a href="https://example.com/index.php/2021/12/12/your-20-with-few-clicks-from-stash/" rel="bookmark">Your With Few Clicks From Stash</a></h2>'
但试图获取文本,你的 20 美元来自 Stash 使用
products.css('h2.entry-title::text').get()
>>> products.css('h2.entry-title::text').get()
>>>
无效。有什么建议吗?谢谢。
实际上,所需的文本节点 Your With Few Clicks From Stash
在 a tag
下。要获得正确的输出,css 表达式应如下所示:
products.css('h2.entry-title a::text').get().strip()
在 scrapy 中的实现 shell:
In [6]: from scrapy.selector import Selector
In [7]: %paste
html_doc="""
<html>
<body>
<h2 class="entry-title" itemprop="headline">
<a href="https://example.com/index.php/2021/12/12/your-20-with-few-clicks-from-stash/" rel="bookmark">
Your With Few Clicks From Stash
</a>
</h2>
</body>
</html>
"""
## -- End pasted text --
In [8]: sel = Selector(text=html_doc)
In [9]: sel.css('h2.entry-title a::text').get().strip()
Out[9]: 'Your With Few Clicks From Stash'