scrapy - AjaxMethod 不可用
scrapy - AjaxMethod not available
我正在尝试练习学习Scrapy网络爬虫和使用分类汽车网站作为主题,以检查对策。我知道 X-AjaxPro-Method 存在,因为 Chrome 开发人员工具显示正在传递 header 并收到正确的响应。但是当在 Scrapy shell 中完成时,我得到 "This method is either not marked with an AjaxMethod or is not available."
以下是使用的 shell 命令:
>>> from scrapy.http import FormRequest
>>>
request=FormRequest(url='https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyer,Carwale.ashx',headers={"X-AjaxPro-Method":"ProcessUsedCarPurchaseInquiry","Content-Type":"application/x-www-form-urlencoded; charset=UTF-8","X-Requested-With":"XMLHttpRequest"},formdata={"profileId":"D1249107","buyerName":"","buyerEmail":"","buyerMobile":"9938223299","carModel":"","makeYear":"","pageUrl":"https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk","isP":"False","transToken":"","ltsrc":"","buyerSourceId":"4","comments":"","cwc":"buJNfItyQKBP8a3OahoJsOOmg","utma":"\"52149691.1076750176.1492103717.1492447801.1492447801.8\"","utmz":"\"52149691.1492103720.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\"","originId":"3","isFromCaptcha":"","isGSDClick":"","isRecommended":"","isCertificationDownload":""})
>>> fetch(request)
2017-04-18 08:45:32 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyerCarwale,Carwale.ashx> (referer: None)
>>> print(response.body)
{"error":{"Message":"This method is either not marked with an AjaxMethod or is not available.","Type":"System.NotSupportedException"}}
>>>
原始页面在https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,必须输入手机phone号码才能获得"Seller Details."
所以,我已经深入挖掘了一些,并将分享更多信息。我已经能够使用浏览器中的开发人员工具将 XHR 导出为 curl 命令,然后将其修剪下来,以便在我看来唯一需要的 header 是 X-AjaxPro-Method 因为 curl 命令仅适用于 header 和数据。
还使用 Python 请求库让它工作。
将您发布的请求数据与我在 Firebug 中看到的数据进行比较,我怀疑您的请求中至少缺少其中一项:
- Cookie 数据(如果您在 settings.py 中打开 cookie 并让 scrapy 先访问 https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,则应该由 scrapy 处理)
- 用户代理 header(将 settings.py 中的 USER_AGENT 设置为看起来像真实浏览器的内容,例如 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0' )
- Referer header(如果您在发送请求之前访问了 https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,应该由 scrapy 自动处理)
总而言之,像 carwale.com 这样的 ajax 有动力的网站有很多活动部件,并且不太好 object 到 "start learning scrapy"
PS:使用 FormRequest 的更好方法是 request = FormRequest.from_response(response_with_form_page, ...)
。这适用于大多数表单,因为 scrapy 会自动从表单页面中提取所有隐藏的 POST 参数。详情见:https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.FormRequest.from_response
我正在尝试练习学习Scrapy网络爬虫和使用分类汽车网站作为主题,以检查对策。我知道 X-AjaxPro-Method 存在,因为 Chrome 开发人员工具显示正在传递 header 并收到正确的响应。但是当在 Scrapy shell 中完成时,我得到 "This method is either not marked with an AjaxMethod or is not available."
以下是使用的 shell 命令:
>>> from scrapy.http import FormRequest
>>>
request=FormRequest(url='https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyer,Carwale.ashx',headers={"X-AjaxPro-Method":"ProcessUsedCarPurchaseInquiry","Content-Type":"application/x-www-form-urlencoded; charset=UTF-8","X-Requested-With":"XMLHttpRequest"},formdata={"profileId":"D1249107","buyerName":"","buyerEmail":"","buyerMobile":"9938223299","carModel":"","makeYear":"","pageUrl":"https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk","isP":"False","transToken":"","ltsrc":"","buyerSourceId":"4","comments":"","cwc":"buJNfItyQKBP8a3OahoJsOOmg","utma":"\"52149691.1076750176.1492103717.1492447801.1492447801.8\"","utmz":"\"52149691.1492103720.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\"","originId":"3","isFromCaptcha":"","isGSDClick":"","isRecommended":"","isCertificationDownload":""})
>>> fetch(request)
2017-04-18 08:45:32 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyerCarwale,Carwale.ashx> (referer: None)
>>> print(response.body)
{"error":{"Message":"This method is either not marked with an AjaxMethod or is not available.","Type":"System.NotSupportedException"}}
>>>
原始页面在https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,必须输入手机phone号码才能获得"Seller Details."
所以,我已经深入挖掘了一些,并将分享更多信息。我已经能够使用浏览器中的开发人员工具将 XHR 导出为 curl 命令,然后将其修剪下来,以便在我看来唯一需要的 header 是 X-AjaxPro-Method 因为 curl 命令仅适用于 header 和数据。
还使用 Python 请求库让它工作。
将您发布的请求数据与我在 Firebug 中看到的数据进行比较,我怀疑您的请求中至少缺少其中一项:
- Cookie 数据(如果您在 settings.py 中打开 cookie 并让 scrapy 先访问 https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,则应该由 scrapy 处理)
- 用户代理 header(将 settings.py 中的 USER_AGENT 设置为看起来像真实浏览器的内容,例如 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0' )
- Referer header(如果您在发送请求之前访问了 https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,应该由 scrapy 自动处理)
总而言之,像 carwale.com 这样的 ajax 有动力的网站有很多活动部件,并且不太好 object 到 "start learning scrapy"
PS:使用 FormRequest 的更好方法是 request = FormRequest.from_response(response_with_form_page, ...)
。这适用于大多数表单,因为 scrapy 会自动从表单页面中提取所有隐藏的 POST 参数。详情见:https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.FormRequest.from_response