从 div 中提取 p 标签 - Scrapy

Question

我希望使用 Scrapy 从以下 HTML 代码片段中提取类别和标题：

<div class="box-text box-text-products">
  <div class="title-wrapper">       
     <p class="category uppercase is-smaller no-text-overflow product-cat op-7">Supplements     </p>
     <p class="name product-title"><a href="https://martslu.com/product/explosive-energy-pre-workout-cherry-punch-300g/">Explosive Energy Pre Workout Cherry Punch – 300g</a></p></div><div class="price-wrapper">
</div>      
</div>

下面是我写的代码

    def parse(self,response):
        for product in response.css('div.box-text.box-text-products::text'):
            yield{
                'category': product.css('div.title-wrapper.p::text').get(),
                'title': product.css('div.title-wrapper>p.name product-title::text').get()}

我仍然不清楚如何在 p 标签中指出特定的 class 名称。感谢任何帮助。

Answer 1

def parse(self,response):
    for product in response.css('div.box-text.box-text-products'):
        yield {
            'category': product.css('div.title-wrapper > p.category::text').get(),
            'title': product.css('div.title-wrapper > p.product-title > a::text').get()
        }

您不熟悉 CSS 选择器。 Google 一些 material 并学习语法。

scrapy 中的解析依赖于 parsel，它引入了 2 个额外的自定义非标准伪元素

::text
::attr(name)

除了这 2 个自定义伪元素外，大多数 css 选择器语法都受支持。

从 div 中提取 p 标签 - Scrapy

Extracting p tags from a div - Scrapy

python

css-selectors

scrapy