Python 请求有效时，Scrapy FormRequest 返回 400 错误

Question

通过 Scrapy FormRequest 发送 Post 请求会导致 400 错误，而通过 Python Requests 发送相同的请求会成功。

请求 headers 和 params 不可能是问题，因为它们处理请求。 Scrapy 中的什么可以打破这个？

下面的代码是运行 scrapy shell:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst returns 400 错误。

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)

Python 请求 returns 200，我可以访问内容。

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

Answer 1

由于我无法从这里访问 url，您可以尝试以下代码是否有效，或者 not.You 还必须添加用户代理。

import scrapy

class ReviewsSpider(scrapy.Spider):
    name = 'reviews' 
    body = "reqNum=1&isLastPoll=false&paramSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
    def start_requests(self):
        yield scrapy.Request(
            url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
            method = "POST",
            body = self.body,
            callback = self.parse,
            headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                'x-requested-with': 'XMLHttpRequest'
               
            }
        )
    def parse(self, response):
        pass

Python 请求有效时，Scrapy FormRequest 返回 400 错误

Scrapy FormRequest returning 400 error while Python Requests works

python

scrapy

web-scraping

python-requests