SCRAPY FORM REQUEST return 没有任何数据

Question

我正在向网站提出表单请求。请求成功，但没有返回任何数据。

日志：

2020-09-05 22:37:57 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://safer.fmcsa.dot.gov/query.asp> (referer: https://safer.fmcsa.dot.gov/)
2020-09-05 22:37:57 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://safer.fmcsa.dot.gov/query.asp> (referer: https://safer.fmcsa.dot.gov/)
2020-09-05 22:37:59 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://safer.fmcsa.dot.gov/query.asp> (referer: https://safer.fmcsa.dot.gov/)
2020-09-05 22:37:59 [scrapy.core.engine] INFO: Closing spider (finished)
2020-09-05 22:37:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

我的代码：

# -*- coding: utf-8 -*-
import scrapy

codes = open('codes.txt').read().split('\n')

class MainSpider(scrapy.Spider):
    name = 'main'
    form_url = 'https://safer.fmcsa.dot.gov/query.asp'
    start_urls = ['https://safer.fmcsa.dot.gov/CompanySnapshot.aspx']

    def parse(self, response):

        for code in codes:
        
            data = {
                'searchtype': 'ANY',
                'query_type': 'queryCarrierSnapshot',
                'query_param': 'USDOT',
                'query_string': code,
            }

            yield scrapy.FormRequest(url=self.form_url, formdata=data, callback=self.parse_form)

    def parse_form(self, response):
        cargo = response.xpath('(//table[@summary="Cargo Carried"]/tbody/tr)[2]')
        for each in cargo:
            each_x = each.xpath('.//td[contains(text(), "X")]/following-sibling::td/font/text()').get()

            yield {
                "X Values": each_x if each_x else "N/A",
            }

以下是我用于 POST REQUEST 的一些示例代码。

2146709

273286

120670

2036998

690147

Answer 1

我相信您只需要从此处的 XPath 中删除 tbody：

    cargo = response.xpath('(//table[@summary="Cargo Carried"]/tbody/tr)[2]')

这样使用：

    cargo = response.xpath('//table[@summary="Cargo Carried"]/tr[2]') 
    # I also removed the () inside the path because you don't need it, but that didn't cause the problem.

这是因为 Scrapy 将从页面解析原始代码，而您的浏览器可能会呈现 tbody 以防它不在源代码中。更多信息 here.

SCRAPY FORM REQUEST return 没有任何数据

SCRAPY FORM REQUEST doesn't return any data

python

http-post

scrapy

web-scraping

日志：

我的代码：