Python 请求有效时,Scrapy FormRequest 返回 400 错误

Scrapy FormRequest returning 400 error while Python Requests works

通过 Scrapy FormRequest 发送 Post 请求会导致 400 错误,而通过 Python Requests 发送相同的请求会成功。

请求 headersparams 不可能是问题,因为它们处理请求。 Scrapy 中的什么可以打破这个?

下面的代码是 运行 scrapy shell:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst returns 400 错误。

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)

Python 请求 returns 200,我可以访问内容。

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

由于我无法从这里访问 url,您可以尝试以下代码是否有效,或者 not.You 还必须添加用户代理。

import scrapy

class ReviewsSpider(scrapy.Spider):
    name = 'reviews' 
    body = "reqNum=1&isLastPoll=false&paramSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
    def start_requests(self):
        yield scrapy.Request(
            url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
            method = "POST",
            body = self.body,
            callback = self.parse,
            headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                'x-requested-with': 'XMLHttpRequest'
               
            }
        )
    def parse(self, response):
        pass