如何使用 Python 抓取 PHP Ajax？

Question

我是 python 的初学者，我正在尝试构建一个 python 程序来从 http://turnpikeshoes.com/shop/TCF00003 中抓取产品描述。 Python 有很多库，我相信有很多方法可以实现我的目标。我已经使用请求成功抓取了一些内容，但是我正在寻找的字段没有显示出来，使用 chrome 检查器我发现了一个 Ajax POST 请求。

这是我的代码

from lxml import html
import requests

url = 'http://turnpikeshoes.com/shop/TCF00003'
#URL
headers = {'user-agent': 'my-app/0.0.1'}
#Header info sent to server
page = requests.get(url, headers=headers)
#Get response
tree = html.fromstring(page.content)
#Page Content


ShortDsc = tree.xpath('//span[@itemprop="reviewBody"]/text()')

LongDsc = tree.xpath('//li[@class="productLongDescription"]/text()')

print 'ShortDsc:', ShortDsc
print 'LongDsc:', LongDsc

我想我需要直接向管理员发送请求-ajax.php

非常感谢任何帮助

Answer 1

如果你想抓取 javascript 内容，你应该在这种情况下尝试使用 selenium：

from selenium import webdriver
import time

driver = webdriver.PhantomJS()
driver.get("http://turnpikeshoes.com/shop/TCF00003")
time.sleep(5)

LongDsc = driver.find_element_by_class_name("productLongDescription").text

print 'LongDsc:', LongDsc

顺便说一句，您还应该安装 PhantomJS 作为无头浏览器。

Answer 2

商店在加载期间向自己执行 POST 请求，并在表单数据中使用 URL 作为参数。这是一个使用 requests.Session 的小脚本，它首先打开商店，然后发送 POST 请求以获取产品信息。它模拟了浏览器将执行的步骤——保存来自第一个请求的 cookie——这可能是从 POST 调用中获得所需响应所必需的。

import requests

product_url = 'http://turnpikeshoes.com/shop/TCF00003'
product_ajax = 'http://turnpikeshoes.com/wp-admin/admin-ajax.php'
data = {'mrQueries[]':'section', 'action':'mrshop_ajax_call', 'url': product_url}

s = requests.Session()
s.get(product_url)
r = s.post(product_ajax, data=data)

print(r.text)

您也可以尝试在开始时获取 cookie 一次并将它们用于所有进一步的 POST 请求。商店有能力通过减少实际响应大小来启用脚本来查看您手中的产品。

Answer 3

这个对我有用。

from selenium import webdriver

ff = webdriver.PhantomJS()
ff.get(url)
ff.find_element_by_xpath("//span[@itemprop='price']").get_attribute("content")

谢谢！

如何使用 Python 抓取 PHP Ajax？

How to scrape PHP Ajax using Python?

python

ajax

screen-scraping