为什么请求对请求有效,但对 scrapy 无效

Why do a request work on requests but not on scrapy

我正在尝试抓取 webpage that loads the results for page 2 and so on when I scroll. So I get the url to the api (img) 它 运行 并且应该可以正常工作。

但它只有在我使用 requests 库时才有效。当我 运行 requests.get() 使用与 scrapy 相同的 url 时,我得到响应 200,但是对于 scrapy 它 returns 500 状态。我不知道为什么这不适用于 scrapy,有什么解释吗?

这是我正在尝试做的事情

奥布里加多。

import scrapy
import json
import re

class ScrapeVagas(scrapy.Spider):
    name = "vagas"
    base_url = "https://www.trabalhabrasil.com.br/api/v1.0/Job/List?idFuncao=0&idCidade=5345&pagina=%d&pesquisa=&ordenacao=1&idUsuario="
    start_urls = [base_url % 100]
    download_delay = 1

    def parse(self, response):
        vagas = json.loads(response.text)
        
        for vaga in range(0, len(vagas)):
            yield {
                "vaga": vagas[vaga]["df"],
                "salario": re.sub("[R$.]", "", vagas[vaga]["sl"]).strip()
            }

您正在收到 500 Internal Server Error 服务器错误响应代码,表明服务器遇到意外情况,无法完成请求。 这里需要 Request header 才能得到正确的响应。查看 scrapy shell.

中的输出
import scrapy
base_url = "https://www.trabalhabrasil.com.br/api/v1.0/Job/List?idFuncao=0&idCidade=5345&pagina=%d&pesquisa=&o
rdenacao=1&idUsuario="
start_urls = [base_url % 100]
start_urls
url = start_urls[0]
headers = {"USER-AGENT":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.3",
                 "referer": "https://www.trabalhabrasil.com.br/vagas-empregos-em-sao-paulo-sp",
                  "authority": "www.trabalhabrasil.com.br",
                 "path": "/api/v1.0/Job/List?idFuncao=100&idCidade=5345&pagina=65&pesquisa=&ordenacao=1&idUsuario=",
       
                "scheme": "https",
                 "accept": "*/*",
               "accept-language": "en-US,en;q=0.9,bn;q=0.8",
      
               "dnt": "1",
                 "sec-fetch-dest": "empty",
                "sec-fetch-mode": "cors",
                   "sec-fetch-site": "same-origin",
      
                }
     
r = scrapy.Request(url, headers=headers)
fetch(r)
2021-01-22 00:30:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.trabalhabrasil.com.br/api/v1.0/Job/List?idFuncao=0&idCidade=5345&pagina=100&pesquisa=&ordenacao=1&idUsuario=> (referer: https://www.trabalhabrasil.com.br/vagas-empregos-em-sao-paulo-sp)
    
    
In [19]: response.status
Out[19]: 200