当它们具有 null/none 的 return 时，我如何为抓取的结果默认值

Question

我从一个网站上抓取了一些信息，其中一些输出不存在并且它 returns 为空。在这种情况下，有没有办法为不同的字段输出默认值。示例脚本如下。

script.py

import scrapy

class UfcscraperSpider(scrapy.Spider):
    name = 'ufcscraper'

    start_urls = ['http://ufcstats.com/statistics/fighters?char=a']

    def parse(self, response):
        for user_info in response.css(".b-statistics__table-row")[2::]:
            result = {
                "fname": user_info.css("td:nth-child(1) a::text").get(),
                "lname": user_info.css("td:nth-child(2) a::text").get(),
                "nname": user_info.css("td:nth-child(3) a::text").get(),
                "height": user_info.css("td:nth-child(4)::text").get().strip(),
                "weight": user_info.css("td:nth-child(5)::text").get().strip(),
                "reach": user_info.css("td:nth-child(6)::text").get().strip(),
                "stance": user_info.css("td:nth-child(7)::text").get().strip(),
                "win": user_info.css("td:nth-child(8)::text").get().strip(),
                "lose": user_info.css("td:nth-child(9)::text").get().strip(),
                "draw": user_info.css("td:nth-child(10)::text").get().strip()
            }

        yield result

例如第一行的nname字段的值为null，而stance的值为“”，这是一个空字符串左右，我如何为这种情况设置默认值。

示例结果

[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]

Answer 1

您可以在函数中放入替换任何 "" 的逻辑，或者您可以循环遍历结果，当您遇到 "" 时，将替换为您想要的任何默认值。

data = [
{"fname": "Tom", "lname": "Aaron", "nname": "", "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]


for idx, each in enumerate(data):
    for k, v in each.items():
        if v == '':
            data[idx][k] = 'DEFAULT'

输出：

print(data)
[
{'fname': 'Tom', 'lname': 'Aaron', 'nname': 'DEFAULT', 'height': '--', 'weight': '155 lbs.', 'reach': '--', 'stance': 'DEFAULT', 'win': '5', 'lose': '3', 'draw': '0'}, 
{'fname': 'Danny', 'lname': 'Abbadi', 'nname': 'The Assassin', 'height': '5\' 11"', 'weight': '155 lbs.', 'reach': '--', 'stance': 'Orthodox', 'win': '4', 'lose': '6', 'draw': '0'}
]

当它们具有 null/none 的 return 时，我如何为抓取的结果默认值

how can i default values for scraped result when they have a return of null/none

python

scrapy

web-scraping