在scrapy中有没有办法从div中获取完整的文本？

Question

我正在使用 scrapy 来抓取网页某些部分的内容。我需要完全按照网页中显示的方式抓取文本。该网页的结构与此类似。

<div class = "uselessInfo">...</div>
<div class = "usefulInfo">
       Some text
       <p>Useful paragraph</p>
       <p>Useful paragraph with <a><span>Important Keywords</span></a>
       <ul>Some interesting data</ul>
</div>
<div class = "usefulInfo">
       Some text
       <ul>Some interesting data</ul>
       <p>Useful paragraph</p>
</div>
<div class = "uselessInfo">...</div>

当我提取信息时，我无法访问子元素中的文本。这也发生在段落内的关键字的情况下。

有没有办法从父元素中获取文本（示例中的usefulInfo）？

Answer 1

你需要这样使用*

your_text = "".join(response.css(".uselessInfo *::text").getall())

在scrapy中有没有办法从div中获取完整的文本？

In scrapy is there a way to obtain the complete text from a div?

web-crawler

scrapy