scrapy 用户代理是否忽略自定义数据属性

Question

它说用户代理将忽略自定义数据属性。图片来自 w3schools
我很好奇，如果 scrapy 在我得到空列表时忽略这些标签，可能是因为 HTML 中使用了数据属性 data-v-529299fa=""。这是我的来源

 <a data-v-529299fa="" target="_blank" href="https://data.amica.com.pl/files/pdm_IO/SER_0019314_ART.pdf" 
 class="product-spec__file-link">
    <font style="vertical-align: inherit;">
        <font style="vertical-align: inherit;">Operating manual AWDG7512CL_1140173 (PL)</font>
    </font>
</a>

我想抓取包含 pdf 的锚标签的 href link。所以这是我试过的

 pdfs = response.xpath('//a[@data-v-529299fa=""]/@href').extract()
# also 
 pdfs = response.css('a[data-v-529299fa=""]::attr(href)').extract()

我得到了 [] 个空列表。有超过 1 个 pdf，所以这就是我使用 extract() 的原因。任何帮助将不胜感激。

Answer 1

不，Scrapy 不会从服务器收到的响应中删除任何内容。

该行的意思是网络浏览器不会根据这些属性的内容进行操作，它们不会根据其内容更改显示的内容（不过 JavaScript 代码可以做到这一点）。

scrapy 用户代理是否忽略自定义数据属性

Does scrapy user agent ignore custom data-attributes

html

python

web-crawler

scrapy

custom-data-attribute