如何在一个 FOR 循环中从 JSON 多个字典中提取数据 - Python
How to extract data in one FOR loop from JSON multiple dictionary - Python
在我的 scrapy 项目中,我想从网站中提取数据。事实证明,所有信息都存储在一些脚本中,我可以轻松阅读 JSON 格式的脚本,并从中提取我需要的数据。
这是我的职责:
def parse(self, response):
items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
for item in json.loads(items)['offers']:
yield {
"title": item['jobTitle'],
"employer": item['employer'],
"country": item['countryName'],
"details_page": item['companyProfileUrl'],
"expiration_date": item['expirationDate'],
'salary': item['salary'],
'employmentLevel': item['employmentLevel'],
}
和 json 文件具有该结构:
var = {
"offers":[
{
"commonOfferId":"1200072247",
"jobTitle":"Automatyk - Programista",
"employer":"MULTIPAK Spółka Akcyjna",
"companyProfileUrl":"https://pracodawcy.pracuj.pl/company/20379037/profile",
"expirationDate":"2021-04-28T12:47:06.273",
"salary":"",
"employmentLevel":"Specjalista (Mid / Regular)" ,
"offers": [
{
"offerId":500092126,
"regionName":"kujawsko-pomorskie",
"cities":["Małe Czyste (pow. chełmiński)"],
"label":"Małe Czyste (pow. chełmiński)"}],
上面一个元素的例子。因此,当我尝试提取城市或 regioName 等数据时,我收到错误消息。我怎样才能从两个字典中进行循环并将该数据日期生成到新字典中?
你没有说清楚你想要什么,但我猜这很接近:
def parse(self, response):
items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
for item in json.loads(items)['offers']:
for offer in item['offers']:
yield {
"title": item['jobTitle'],
"employer": item['employer'],
"country": item['countryName'],
"details_page": item['companyProfileUrl'],
"expiration_date": item['expirationDate'],
'salary': item['salary'],
'employmentLevel': item['employmentLevel'],
'offernumber': offer['offerId'],
'region': offer['regionName'],
'city': offer['cities'][0]
}
在我的 scrapy 项目中,我想从网站中提取数据。事实证明,所有信息都存储在一些脚本中,我可以轻松阅读 JSON 格式的脚本,并从中提取我需要的数据。
这是我的职责:
def parse(self, response):
items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
for item in json.loads(items)['offers']:
yield {
"title": item['jobTitle'],
"employer": item['employer'],
"country": item['countryName'],
"details_page": item['companyProfileUrl'],
"expiration_date": item['expirationDate'],
'salary': item['salary'],
'employmentLevel': item['employmentLevel'],
}
和 json 文件具有该结构:
var = {
"offers":[
{
"commonOfferId":"1200072247",
"jobTitle":"Automatyk - Programista",
"employer":"MULTIPAK Spółka Akcyjna",
"companyProfileUrl":"https://pracodawcy.pracuj.pl/company/20379037/profile",
"expirationDate":"2021-04-28T12:47:06.273",
"salary":"",
"employmentLevel":"Specjalista (Mid / Regular)" ,
"offers": [
{
"offerId":500092126,
"regionName":"kujawsko-pomorskie",
"cities":["Małe Czyste (pow. chełmiński)"],
"label":"Małe Czyste (pow. chełmiński)"}],
上面一个元素的例子。因此,当我尝试提取城市或 regioName 等数据时,我收到错误消息。我怎样才能从两个字典中进行循环并将该数据日期生成到新字典中?
你没有说清楚你想要什么,但我猜这很接近:
def parse(self, response):
items = response.css("script:contains('window.__INITIAL_STATE__')::text").re_first(r"window\.__INITIAL_STATE__ =(.*);")
for item in json.loads(items)['offers']:
for offer in item['offers']:
yield {
"title": item['jobTitle'],
"employer": item['employer'],
"country": item['countryName'],
"details_page": item['companyProfileUrl'],
"expiration_date": item['expirationDate'],
'salary': item['salary'],
'employmentLevel': item['employmentLevel'],
'offernumber': offer['offerId'],
'region': offer['regionName'],
'city': offer['cities'][0]
}