当它们具有 null/none 的 return 时,我如何为抓取的结果默认值
how can i default values for scraped result when they have a return of null/none
我从一个网站上抓取了一些信息,其中一些输出不存在并且它 returns 为空。在这种情况下,有没有办法为不同的字段输出默认值。示例脚本如下。
script.py
import scrapy
class UfcscraperSpider(scrapy.Spider):
name = 'ufcscraper'
start_urls = ['http://ufcstats.com/statistics/fighters?char=a']
def parse(self, response):
for user_info in response.css(".b-statistics__table-row")[2::]:
result = {
"fname": user_info.css("td:nth-child(1) a::text").get(),
"lname": user_info.css("td:nth-child(2) a::text").get(),
"nname": user_info.css("td:nth-child(3) a::text").get(),
"height": user_info.css("td:nth-child(4)::text").get().strip(),
"weight": user_info.css("td:nth-child(5)::text").get().strip(),
"reach": user_info.css("td:nth-child(6)::text").get().strip(),
"stance": user_info.css("td:nth-child(7)::text").get().strip(),
"win": user_info.css("td:nth-child(8)::text").get().strip(),
"lose": user_info.css("td:nth-child(9)::text").get().strip(),
"draw": user_info.css("td:nth-child(10)::text").get().strip()
}
yield result
例如第一行的nname字段的值为null,而stance的值为“”,这是一个空字符串左右,我如何为这种情况设置默认值。
示例结果
[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]
您可以在函数中放入替换任何 "" 的逻辑,或者您可以循环遍历结果,当您遇到 ""
时,将替换为您想要的任何默认值。
data = [
{"fname": "Tom", "lname": "Aaron", "nname": "", "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]
for idx, each in enumerate(data):
for k, v in each.items():
if v == '':
data[idx][k] = 'DEFAULT'
输出:
print(data)
[
{'fname': 'Tom', 'lname': 'Aaron', 'nname': 'DEFAULT', 'height': '--', 'weight': '155 lbs.', 'reach': '--', 'stance': 'DEFAULT', 'win': '5', 'lose': '3', 'draw': '0'},
{'fname': 'Danny', 'lname': 'Abbadi', 'nname': 'The Assassin', 'height': '5\' 11"', 'weight': '155 lbs.', 'reach': '--', 'stance': 'Orthodox', 'win': '4', 'lose': '6', 'draw': '0'}
]
我从一个网站上抓取了一些信息,其中一些输出不存在并且它 returns 为空。在这种情况下,有没有办法为不同的字段输出默认值。示例脚本如下。
script.py
import scrapy
class UfcscraperSpider(scrapy.Spider):
name = 'ufcscraper'
start_urls = ['http://ufcstats.com/statistics/fighters?char=a']
def parse(self, response):
for user_info in response.css(".b-statistics__table-row")[2::]:
result = {
"fname": user_info.css("td:nth-child(1) a::text").get(),
"lname": user_info.css("td:nth-child(2) a::text").get(),
"nname": user_info.css("td:nth-child(3) a::text").get(),
"height": user_info.css("td:nth-child(4)::text").get().strip(),
"weight": user_info.css("td:nth-child(5)::text").get().strip(),
"reach": user_info.css("td:nth-child(6)::text").get().strip(),
"stance": user_info.css("td:nth-child(7)::text").get().strip(),
"win": user_info.css("td:nth-child(8)::text").get().strip(),
"lose": user_info.css("td:nth-child(9)::text").get().strip(),
"draw": user_info.css("td:nth-child(10)::text").get().strip()
}
yield result
例如第一行的nname字段的值为null,而stance的值为“”,这是一个空字符串左右,我如何为这种情况设置默认值。
示例结果
[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]
您可以在函数中放入替换任何 "" 的逻辑,或者您可以循环遍历结果,当您遇到 ""
时,将替换为您想要的任何默认值。
data = [
{"fname": "Tom", "lname": "Aaron", "nname": "", "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]
for idx, each in enumerate(data):
for k, v in each.items():
if v == '':
data[idx][k] = 'DEFAULT'
输出:
print(data)
[
{'fname': 'Tom', 'lname': 'Aaron', 'nname': 'DEFAULT', 'height': '--', 'weight': '155 lbs.', 'reach': '--', 'stance': 'DEFAULT', 'win': '5', 'lose': '3', 'draw': '0'},
{'fname': 'Danny', 'lname': 'Abbadi', 'nname': 'The Assassin', 'height': '5\' 11"', 'weight': '155 lbs.', 'reach': '--', 'stance': 'Orthodox', 'win': '4', 'lose': '6', 'draw': '0'}
]