Post 请求不适用于 scrapy 但适用于请求

Post request not working with scrapy but works with requests

请求代码:

listings_url = "https://www.biltorvet.dk/Api/Search/Page"
form_data = {
        "pageNumber": "1",
        "searchOrigin": "1",
        "searchValue": "22526899",
        "sort": ""
    }
response = requests.post(listings_url, json=form_data)

if response.status_code == 200:
    data = response.json()
    print(data)

Scrapy 代码:

class BiltorvetScraperSpider(scrapy.Spider):
    name = 'biltorvet'
    listings_url = "https://www.biltorvet.dk/Api/Search/Page"
    form_data = {
        "pageNumber": "1",
        "searchOrigin": "1",
        "searchValue": "22526899",
        "sort": ""
    }

def start_requests(self):
    yield FormRequest(url=self.listings_url, callback=self.parse, body=json.dumps(self.form_data))

def parse(self, response):
    print(response.text)

我在 scrapy 请求中得到了 400。我也尝试使用 headers 但结果相同。尝试将参数从 body 更改为 json 仍然没有影响。

这应该符合以下目的:

import json
import scrapy
from scrapy.http.request import Request

class BiltorvetScraperSpider(scrapy.Spider):
    name = 'biltorvet'
    start_url = "https://www.biltorvet.dk/Api/Search/Page"
    
    form_data = {
        "pageNumber": "1",
        "searchOrigin": "1",
        "searchValue": "22526899",
        "sort": ""
    }
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36',
        'Content-Type': 'application/json; charset=UTF-8',
    }

    def start_requests(self):
        yield Request(
            self.start_url, 
            headers=self.headers,
            callback=self.parse, 
            method='POST',
            body=json.dumps(self.form_data)
        )

    def parse(self, response):
        print(response.json())

或者,您可以根据documentation进行如下尝试:

from scrapy.http import JsonRequest

def start_requests(self):
    yield JsonRequest(
        self.start_url, 
        headers=self.headers,
        callback=self.parse, 
        data=self.form_data
    )

def parse(self, response):
    print(response.json())