Xpath returns 空

Question

我需要抓取此页面的价格：https://www.asos.com/monki/monki-lisa-cropped-vest-top-with-ruched-side-in-black/prd/23590636?colourwayid=60495910&cid=2623

然而它总是返回 null:

我的代码：

'price' :response.xpath('//*[contains(@class, "current-price")]').get()

有人可以帮忙吗？

谢谢！

使用 XHR 提取时：

如何检索价格？

Answer 1

您使用的 XPath 返回 2 个不同的元素。尝试以下 xpath 获取商品的价格

driver.find_element_by_xpath("//span[@data-id='current-price']").text

更新：

price :response.xpath('//span[@data-id='current-price']').get()

Answer 2

您试过的代码：

price' :response.xpath('//*[contains(@class, "current-price")]').get()

看起来你有自己的框架编写方法，但在本机 Selenium-Python 绑定中我会这样做 :-

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
print(wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(@class, "current-price")]/span"))).text)

Answer 3

您的问题不是 xpath，而是正在使用 XHR 检索价格。

如果您使用 scrapy sheel 并输入 view(response)，您会看到没有生成价格：

查看原网页出处，搜索价格：

那就用这个url凑个价钱：

    def parse(self, response):
        import re
        price_url = 'https://www.asos.com' + re.search(r'window.asos.pdp.config.stockPriceApiUrl = \'(.+)\'', response.text).group(1)
        yield scrapy.Request(url=price_url,
                             method='GET',
                             callback=self.parse_price,
                             headers=self.headers)

    def parse_price(self, response):
        import json
        jsonresponse = json.loads(response.text)
        ...............
        ...............
        ...............

我无法使用我提供的 headers 解决 403 错误，但也许你会更幸运。

编辑：

为了从 json 文件中获取价格，实际上不需要 json.loads

    def parse_price(self, response):
        jsonresponse = response.json()[0]
        price = jsonresponse['productPrice']['current']['text']
        # You can also use jsonresponse.get() if you prefer
        print(price)

输出：

£10.00

Xpath returns 空

Xpath returns null

python

selenium

xpath

scrapy

web-scraping