将 scrapy 输出中的元素列表分成单独的行

divide list of elements in scrapy output into seperate rows

我试图将 Scrapy 的输出分成 Excel 文件中的单独行,但我得到了这样的结果

换句话说,变体 ID、价格和名称的每个输出都应放在 Excel 中的单独行中。

我正在使用 scrapy-xlsx 0.1.1 库将输出导出到 xlsx 文件(不能是 csv)。

请告诉我问题出在哪里。

import scrapy
from ..items import ZooplusItem
import re
class ZooplusDeSpider(scrapy.Spider):
name = 'zooplus_de'
allowed_domains = ['zooplus.de']
start_urls = ['https://www.zooplus.de/shop/hunde/hundefutter_trockenfutter/diaetfutter']

def parse(self, response):
    for link in response.css('.MuiGrid-root.MuiGrid-container.MuiGrid-spacing-xs-2.MuiGrid-justify-xs-flex-end'):
        items = ZooplusItem()
        redirect_urls = response.request.meta.get('redirect_urls')
        items['url'] = link.redirect_urls[0] if redirect_urls else response.request.url
        items['product_url'] = link.css('.MuiGrid-root.product-image a::attr(href)').getall()
        items['title'] = link.css('h3 a::text').getall()
        items['id'] = link.css('h3 a::attr(id)').getall()

        items['review'] = link.css('span.sc-fzoaKM.kVcaXm::text').getall()
        items['review'] = re.sub(r'\D', " ", str(items['review']))
        items['review'] = items['review'].replace(" ", "")
        #items['review'] = int(items['review'])

        items['rate'] = len(link.css('a.v3-link i[role=full-star]'))
        items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
        items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
        items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

        yield items

如果你想存储所有具有重复公共信息的变体,那么你需要循环遍历每个变体并分别产生它。您可以复制您已经收集的常用信息并添加到其中。

总结替换

items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

yield item

类似

for i in link.css("[data-zta='product-variant']"):
    variant = items.copy()
    variant["variant_id"] = i.attrib["data-variant-id"]
    variant["variant_name"] = "".join(i.css(".title > div::text").getall()).strip()
    variant['variant_price'] = i.css("[itemprop='price']::attr(content)").get()
 
    yield variant