我可以使用带有 'i' 的 while 循环作为将在 xpath 中的 tr[i] 中使用的变量吗?

Can i use while loop with 'i' as a variable which will be used in tr[i] in xpath?

import scrapy
import logging

class AssetSpider(scrapy.Spider):
    name = 'asset'
    start_urls = ['http://mnregaweb4.nic.in/netnrega/asset_report_dtl.aspx?lflag=eng&state_name=WEST%20BENGAL&state_code=32&district_name=NADIA&district_code=3201&block_name=KRISHNAGAR-I&block_code=&panchayat_name=DOGACHI&panchayat_code=3201009009&fin_year=2020-2021&source=national&Digest=8+kWKUdwzDQA1IJ5qhD8Fw']
def parse(self, response):
    i = 4
    while i<2236:
        assetid = response.xpath("//table[2]//tr['i']/td[2]/text()")
        assetcategory = response.xpath("//table[2]//tr['i']/td[3]/text()")
        schemecode = response.xpath("//table[2]//tr['i']/td[5]/text()")
        link = response.xpath("//table[2]//tr['i']/td[6]/a/@href")
        schemename = response.xpath("//table[2]//tr['i']/td[7]/text()")
        yield {
            'assetid' : assetid,
            'assetcategory' : assetcategory,
            'schemecode' : schemecode,
            'link' : link,
            'schemename' : schemename
        }
        i += 1

我想使用'i'变量在tr[position]的xpath中从4到2235循环。我只是不知道这是否可能!如果可能的话,那么正确的方法是什么?我的不行。

您将字符串发送到 xpath,所以我建议使用格式化...例如:

 response.xpath(f"//table[2]//tr[{i}]/td[2]/text()")

当然,这是可能的并且被广泛使用。
您可以使用变量格式化字符串。
有几种语法。
例如,您可以这样做:

i = 4
while i<2236:
    assetid_path = "//table[2]//tr[{1}]/td[2]/text()".format(i)
    assetcategory_path = "//table[2]//tr[{1}]/td[3]/text()".format(i)
    schemecode_path = "//table[2]//tr[{1}]/td[5]/text()".format(i)
    link_path = "//table[2]//tr[{1}]/td[6]/a/@href".format(i)
    schemename_path = "//table[2]//tr[{1}]/td[7]/text()".format(i)
    assetid = response.xpath(assetid_path)
    assetcategory = response.xpath(assetcategory_path)
    schemecode = response.xpath(schemecode_path)
    link = response.xpath(link_path)
    schemename = response.xpath(schemename_path)
    yield {
        'assetid' : assetid,
        'assetcategory' : assetcategory,
        'schemecode' : schemecode,
        'link' : link,
        'schemename' : schemename
    }
    i += 1

虽然上面可以这样缩短:

i = 4
while i<2236:
    root_path = "//table[2]//tr[{1}]".format(i)
    assetid_path = root_path + "/td[2]/text()"
    assetcategory_path = root_path + "/td[3]/text()"
    schemecode_path = root_path + "/td[5]/text()"
    link_path = root_path + "/td[6]/a/@href"
    schemename_path = root_path + "/td[7]/text()"
    assetid = response.xpath(assetid_path)
    assetcategory = response.xpath(assetcategory_path)
    schemecode = response.xpath(schemecode_path)
    link = response.xpath(link_path)
    schemename = response.xpath(schemename_path)
    yield {
        'assetid' : assetid,
        'assetcategory' : assetcategory,
        'schemecode' : schemecode,
        'link' : link,
        'schemename' : schemename
    }
    i += 1

但更好的方法是使用绑定变量。如下:

i = 4
while i<2236:
    assetid = response.xpath("//table[2]//tr[$i]/td[2]/text()",i=i))
    assetcategory = response.xpath("//table[2]//tr[$i]/td[3]/text()",i=i))
    schemecode = response.xpath("//table[2]//tr[$i]/td[5]/text()",i=i)
    link = response.xpath("//table[2]//tr[$i]/td[6]/a/@href",i=i)
    schemename = response.xpath("//table[2]//tr[$i]/td[7]/text()",i=i)
    yield {
        'assetid' : assetid,
        'assetcategory' : assetcategory,
        'schemecode' : schemecode,
        'link' : link,
        'schemename' : schemename
    }
    i += 1