Scrapy 看不到 del HTML 标签内的数据

Question

我想从这个link中刮出原价和折扣价：

https://www2.hm.com/hu_hu/productpage.0903062001.html

span 和 del class 都有奇怪的 class 名称，但我能够在 Scrapy shell 中找到折扣价，如下所示：

response.css('span.price-value::text').get()

但是我对 del 标签内的原件没有运气:

<del class="BodyText-module--general__32l6J ProductPrice-module--priceValueOriginal__3U3Cz">6&nbsp;995 Ft</del>

我尝试了 xpath 和 css 但是 Scrapy 找不到这个标签。

Answer 1

原价和折扣价都作为 JSON 数据嵌入到页面源本身

page_source_data = response.xpath('//div[@class= "tealiumProductviewtag productview parbase"]//text()')[0]
re.findall('product_original_price : \[(.*?)\],', page_source_data)
re.findall('product_list_price :  \["(.*?)\],', page_source_data)

这可以用来查找价格

Scrapy 看不到 del HTML 标签内的数据

Scrapy can not see data inside del HTML tag

html

css

python

xpath

scrapy