将 json 数据转换为 pandas 数据框
Transforming json data into a pandas dataframe
我正在使用 python 程序包 censusgeocode
对街道地址进行地理编码并获取可用于合并其他人口普查数据的相应地理 ID。
我有一个包含我所有街道地址的 csv 文件,这段代码可以很好地加载程序、引入数据并使用 geocode
函数循环遍历每个程序:
#For geocoding:
import censusgeocode as cg
#For data handling:
import pandas as pd
addresses = pd.read_csv('addresslist.csv')
geo_set = []
#just test it for three addresses
for index, row in addresses.iloc[0:2].iterrows():
try:
nextline = cg.address(str(row['residential_address']), city=str(row['mailing_city']), state=str(row['mailing_state']), zipcode=str(row['mailing_zip_code']))
nextline
geo_set.append(nextline)
except:
pass
这就是上下文;以上所有工作正常。我正在苦苦挣扎的是将结果输出转换为 pandas 数据帧。这是我的代码:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[]})
for p in geo_set:
for i in p['addressComponents']:
new_result = pd.DataFrame({
"fromAddress":[i['fromAddress']],
"streetName":[i['streetName']],
"suffixType":[i['suffixType']],
"state":[i['state']],
"city":[i['city']],
"zip":[i['zip']]
})
emptydata = emptydata.append(new_result)
我已经尝试更改一百万个不同的东西并不断收到错误消息。任何人都可以建议我的代码是如何出错的。我很确定这与我试图理解嵌套结构的方式有关。我收到的错误是:
TypeError: list indices must be integers or slices, not str
这是我要制作成数据框的数据:
[[{'addressComponents': {'city': 'BOULDER',
'fromAddress': '1',
'preDirection': 'E',
'preQualifier': '',
'preType': '',
'state': 'CO',
'streetName': 'REVEREND',
'suffixDirection': '',
'suffixQualifier': '',
'suffixType': 'AVE',
'toAddress': '99',
'zip': '80211'},
'coordinates': {'x': -135.98743, 'y': 43.714783},
'geographies': {'2010 Census Blocks': [{'AREALAND': 21481,
'AREAWATER': 0,
'BASENAME': '4003',
'BLKGRP': '4',
'BLOCK': '4003',
'CENTLAT': '+43.7156677',
'CENTLON': '-135.9868842',
'COUNTY': '031',
'FUNCSTAT': 'S',
'GEOID': '080300028024003',
'INTPTLAT': '+43.7156677',
'INTPTLON': '-135.9868842',
'LSADC': 'BK',
'LWBLKTYP': 'L',
'MTFCC': 'G5040',
'NAME': 'Block 4113',
'OBJECTID': 6626210,
'OID': 210403980440495,
'STATE': '08',
'SUFFIX': '',
'TRACT': '002802'}],
'Census Tracts': [{'status': 'Layer query encountered an error: java.lang.RuntimeException: Failed to return'}],
'Counties': [{'AREALAND': 397083755,
'AREAWATER': 4237705,
'BASENAME': 'Boulder',
'CENTLAT': '+43.7621497',
'CENTLON': '-135.8760655',
'COUNTY': '033',
'COUNTYCC': 'H6',
'COUNTYNS': '00198131',
'FUNCSTAT': 'C',
'GEOID': '08033',
'INTPTLAT': '+43.7618502',
'INTPTLON': '-135.8811054',
'LSADC': '06',
'MTFCC': 'G4020',
'NAME': 'Boulder County',
'OBJECTID': 625,
'OID': 27590700234321,
'STATE': '08'}],
'States': [{'AREALAND': 268426005696,
'AREAWATER': 1178507593,
'BASENAME': 'Colorado',
'CENTLAT': '+38.9976179',
'CENTLON': '-105.5478280',
'DIVISION': '8',
'FUNCSTAT': 'A',
'GEOID': '08',
'INTPTLAT': '+38.9938482',
'INTPTLON': '-105.5083165',
'LSADC': '00',
'MTFCC': 'G4000',
'NAME': 'Colorado',
'OBJECTID': 27,
'OID': 2749086215995,
'REGION': '4',
'STATE': '08',
'STATENS': '01779779',
'STUSAB': 'CO'}]},
'matchedAddress': '1 E BAYAUD AVE, DENVER, CO, 80209',
'tigerLine': {'side': 'L', 'tigerLineId': '177330882'}}],
[{'addressComponents': {'city': 'DENVER',
'fromAddress': '1',
'preDirection': 'E',
'preQualifier': '',
'preType': '',
'state': 'CO',
'streetName': 'REVEREND',
'suffixDirection': '',
'suffixQualifier': '',
'suffixType': 'AVE',
'toAddress': '99',
'zip': '80209'},
'coordinates': {'x': -135.98743, 'y': 43.714783},
'geographies': {'2010 Census Blocks': [{'AREALAND': 21481,
'AREAWATER': 0,
'BASENAME': '4003',
'BLKGRP': '4',
'BLOCK': '4003',
'CENTLAT': '+43.7156677',
'CENTLON': '-135.9868842',
'COUNTY': '033',
'FUNCSTAT': 'S',
'GEOID': '080330028024113',
'INTPTLAT': '+43.7156677',
'INTPTLON': '-135.9868842',
'LSADC': 'BK',
'LWBLKTYP': 'L',
'MTFCC': 'G5041',
'NAME': 'Block 4233',
'OBJECTID': 6626210,
'OID': 210403980440495,
'STATE': '08',
'SUFFIX': '',
'TRACT': '002802'}],
'Census Tracts': [{'AREALAND': 886991,
'AREAWATER': 0,
'BASENAME': '32.02',
'CENTLAT': '+43.7177365',
'CENTLON': '-135.9841763',
'COUNTY': '031',
'FUNCSTAT': 'S',
'GEOID': '08033002802',
'INTPTLAT': '+43.7177365',
'INTPTLON': '-135.9841763',
'LSADC': 'CT',
'MTFCC': 'G5020',
'NAME': 'Census Tract 41.02',
'OBJECTID': 65498,
'OID': 20790703831619,
'STATE': '08',
'TRACT': '002802'}],
'Counties': [{'AREALAND': 397083755,
'AREAWATER': 4237705,
'BASENAME': 'Boulder',
'CENTLAT': '+43.7621497',
'CENTLON': '-135.8760655',
'COUNTY': '033',
'COUNTYCC': 'H6',
'COUNTYNS': '00198133',
'FUNCSTAT': 'C',
'GEOID': '08033',
'INTPTLAT': '+43.7618502',
'INTPTLON': '-135.8811054',
'LSADC': '06',
'MTFCC': 'G4020',
'NAME': 'Boulder County',
'OBJECTID': 625,
'OID': 27590700234321,
'STATE': '08'}],
'States': [{'AREALAND': 268426005696,
'AREAWATER': 1178507593,
'BASENAME': 'Colorado',
'CENTLAT': '+43.9976179',
'CENTLON': '-135.5478280',
'DIVISION': '8',
'FUNCSTAT': 'A',
'GEOID': '08',
'INTPTLAT': '+43.9938482',
'INTPTLON': '-135.5083165',
'LSADC': '00',
'MTFCC': 'G4000',
'NAME': 'Colorado',
'OBJECTID': 27,
'OID': 2749086215995,
'REGION': '4',
'STATE': '08',
'STATENS': '01779779',
'STUSAB': 'CO'}]},
'matchedAddress': '1 E REVEREND AVE, BOULDER, CO, 88090',
'tigerLine': {'side': 'L', 'tigerLineId': '177330882'}}]]
对原始内容的补充 POST
我正试图在 JSON 文件的不同部分提取更多变量。它们都在树的 '2010 Census Tracts'
部分。通过 运行 此代码(改编自您与我分享的内容):
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
d = i['addressComponents']
e = i['geographies']
for w in e:
g = e['2010 Census Blocks']
print(g)
我可以打印我想要的树的所有额外部分。但是当我尝试将其集成到提取变量并将它们附加到我的数据框的部分时,我得到了与以前相同的 TypeError
消息。
这是我的代码:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
d = i['addressComponents']
e = i['geographies']
for w in e:
g = e['2010 Census Blocks']
new_result = pd.DataFrame({
"fromAddress":[d['fromAddress']],
"streetName":[d['streetName']],
"suffixType":[d['suffixType']],
"state":[d['state']],
"city":[d['city']],
"zip":[d['zip']],
"BASENAME":[g['BASENAME']],
"CENTLAT":[g['CENTLAT']],
"COUNTY":[g['COUNTY']],
"GEOID":[g['GEOID']],
"NAME":[g['NAME']],
"BLKGRP":[g['BLKGRP']],
"BLOCK":[g['BLOCK']]
})
emptydata = emptydata.append(new_result)
你可以简单地做:
emptydata = pd.DataFrame([{
"fromAddress":[i['fromAddress']],
"streetName":[i['streetName']],
"suffixType":[i['suffixType']],
"state":[i['state']],
"city":[i['city']],
"zip":[i['zip']]
} for p in geo_set for i in p['addressComponents']])
这里的问题是嵌套的复杂性,嵌套的 for 循环没有到达内层。您的输出包含一个嵌套有嵌套字典列表的列表。当您尝试迭代 geo_set
一层时,p['addressComponents']
失败,因为 p
是嵌套字典的列表,而不是您预期的字典。您需要再次遍历 p
以访问包含键 'addressComponents'
的迭代字典 i
,它现在包含您要检索的所有项目:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
add_comp = i['addressComponents']
census_block = i['geographies']['2010 Census Blocks'][0]
new_result = pd.DataFrame({
"fromAddress":[add_comp['fromAddress']],
"streetName":[add_comp['streetName']],
"suffixType":[add_comp['suffixType']],
"state":[add_comp['state']],
"city":[add_comp['city']],
"zip":[add_comp['zip']],
"BASENAME": [census_block['BASENAME']],
"CENTLAT": [census_block['CENTLAT']],
"COUNTY": [census_block['COUNTY']],
"GEOID": [census_block['GEOID']],
"NAME": [census_block['NAME']],
"BLKGRP": [census_block['BLKGRP']],
"BLOCK": [census_block['BLOCK']]
})
emptydata = emptydata.append(new_result)
输出空数据:
BASENAME BLKGRP BLOCK CENTLAT COUNTY GEOID NAME \
0 4003 4 4003 +43.7156677 031 080300028024003 Block 4113
0 4003 4 4003 +43.7156677 033 080330028024113 Block 4233
city fromAddress state streetName suffixType zip
0 BOULDER 1 CO REVEREND AVE 80211
0 DENVER 1 CO REVEREND AVE 80209
作为参考,这些调试起来很简单 - 您收到的 TypeError: list indices must be integers or slices, not str
是切片出错的极好提示。由于切片使用 []
语法,还有什么使用相同的语法?字典键,即 p['addressComponents']。如果您尝试过:
for p in geo_set:
print(p['addressComponents'])
你会收到同样的错误。您现在已经成功地缩小了错误来源的范围,并且可以通过逐步查看数据来解决问题。
备选方案:
如果您不希望您的代码过于繁重,可以使用字典驱动的方法:
df_dict = {}
df_cols = ["fromAddress", "streetName", "suffixType", "state", "city", "zip", "BASENAME", "CENTLAT", "COUNTY", "GEOID", "NAME", "BLKGRP", "BLOCK"]
for p in geo_set:
for i in p:
for key, item in i['addressComponents'].items():
if key in df_cols:
df_dict.setdefault(key,[]).append(item)
for d in i['geographies']['2010 Census Blocks']:
for key, item in d.items():
if key in df_cols:
df_dict.setdefault(key,[]).append(item)
emptydata = pd.DataFrame.from_dict(df_dict)
输出是一样的,你最终不会创建那么多临时 DataFrame 对象。但需要注意的是,DataFrame 的设置现在可读性较差。
同样,跟踪数据中的列表和字典,并相应地进行迭代。
我正在使用 python 程序包 censusgeocode
对街道地址进行地理编码并获取可用于合并其他人口普查数据的相应地理 ID。
我有一个包含我所有街道地址的 csv 文件,这段代码可以很好地加载程序、引入数据并使用 geocode
函数循环遍历每个程序:
#For geocoding:
import censusgeocode as cg
#For data handling:
import pandas as pd
addresses = pd.read_csv('addresslist.csv')
geo_set = []
#just test it for three addresses
for index, row in addresses.iloc[0:2].iterrows():
try:
nextline = cg.address(str(row['residential_address']), city=str(row['mailing_city']), state=str(row['mailing_state']), zipcode=str(row['mailing_zip_code']))
nextline
geo_set.append(nextline)
except:
pass
这就是上下文;以上所有工作正常。我正在苦苦挣扎的是将结果输出转换为 pandas 数据帧。这是我的代码:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[]})
for p in geo_set:
for i in p['addressComponents']:
new_result = pd.DataFrame({
"fromAddress":[i['fromAddress']],
"streetName":[i['streetName']],
"suffixType":[i['suffixType']],
"state":[i['state']],
"city":[i['city']],
"zip":[i['zip']]
})
emptydata = emptydata.append(new_result)
我已经尝试更改一百万个不同的东西并不断收到错误消息。任何人都可以建议我的代码是如何出错的。我很确定这与我试图理解嵌套结构的方式有关。我收到的错误是:
TypeError: list indices must be integers or slices, not str
这是我要制作成数据框的数据:
[[{'addressComponents': {'city': 'BOULDER',
'fromAddress': '1',
'preDirection': 'E',
'preQualifier': '',
'preType': '',
'state': 'CO',
'streetName': 'REVEREND',
'suffixDirection': '',
'suffixQualifier': '',
'suffixType': 'AVE',
'toAddress': '99',
'zip': '80211'},
'coordinates': {'x': -135.98743, 'y': 43.714783},
'geographies': {'2010 Census Blocks': [{'AREALAND': 21481,
'AREAWATER': 0,
'BASENAME': '4003',
'BLKGRP': '4',
'BLOCK': '4003',
'CENTLAT': '+43.7156677',
'CENTLON': '-135.9868842',
'COUNTY': '031',
'FUNCSTAT': 'S',
'GEOID': '080300028024003',
'INTPTLAT': '+43.7156677',
'INTPTLON': '-135.9868842',
'LSADC': 'BK',
'LWBLKTYP': 'L',
'MTFCC': 'G5040',
'NAME': 'Block 4113',
'OBJECTID': 6626210,
'OID': 210403980440495,
'STATE': '08',
'SUFFIX': '',
'TRACT': '002802'}],
'Census Tracts': [{'status': 'Layer query encountered an error: java.lang.RuntimeException: Failed to return'}],
'Counties': [{'AREALAND': 397083755,
'AREAWATER': 4237705,
'BASENAME': 'Boulder',
'CENTLAT': '+43.7621497',
'CENTLON': '-135.8760655',
'COUNTY': '033',
'COUNTYCC': 'H6',
'COUNTYNS': '00198131',
'FUNCSTAT': 'C',
'GEOID': '08033',
'INTPTLAT': '+43.7618502',
'INTPTLON': '-135.8811054',
'LSADC': '06',
'MTFCC': 'G4020',
'NAME': 'Boulder County',
'OBJECTID': 625,
'OID': 27590700234321,
'STATE': '08'}],
'States': [{'AREALAND': 268426005696,
'AREAWATER': 1178507593,
'BASENAME': 'Colorado',
'CENTLAT': '+38.9976179',
'CENTLON': '-105.5478280',
'DIVISION': '8',
'FUNCSTAT': 'A',
'GEOID': '08',
'INTPTLAT': '+38.9938482',
'INTPTLON': '-105.5083165',
'LSADC': '00',
'MTFCC': 'G4000',
'NAME': 'Colorado',
'OBJECTID': 27,
'OID': 2749086215995,
'REGION': '4',
'STATE': '08',
'STATENS': '01779779',
'STUSAB': 'CO'}]},
'matchedAddress': '1 E BAYAUD AVE, DENVER, CO, 80209',
'tigerLine': {'side': 'L', 'tigerLineId': '177330882'}}],
[{'addressComponents': {'city': 'DENVER',
'fromAddress': '1',
'preDirection': 'E',
'preQualifier': '',
'preType': '',
'state': 'CO',
'streetName': 'REVEREND',
'suffixDirection': '',
'suffixQualifier': '',
'suffixType': 'AVE',
'toAddress': '99',
'zip': '80209'},
'coordinates': {'x': -135.98743, 'y': 43.714783},
'geographies': {'2010 Census Blocks': [{'AREALAND': 21481,
'AREAWATER': 0,
'BASENAME': '4003',
'BLKGRP': '4',
'BLOCK': '4003',
'CENTLAT': '+43.7156677',
'CENTLON': '-135.9868842',
'COUNTY': '033',
'FUNCSTAT': 'S',
'GEOID': '080330028024113',
'INTPTLAT': '+43.7156677',
'INTPTLON': '-135.9868842',
'LSADC': 'BK',
'LWBLKTYP': 'L',
'MTFCC': 'G5041',
'NAME': 'Block 4233',
'OBJECTID': 6626210,
'OID': 210403980440495,
'STATE': '08',
'SUFFIX': '',
'TRACT': '002802'}],
'Census Tracts': [{'AREALAND': 886991,
'AREAWATER': 0,
'BASENAME': '32.02',
'CENTLAT': '+43.7177365',
'CENTLON': '-135.9841763',
'COUNTY': '031',
'FUNCSTAT': 'S',
'GEOID': '08033002802',
'INTPTLAT': '+43.7177365',
'INTPTLON': '-135.9841763',
'LSADC': 'CT',
'MTFCC': 'G5020',
'NAME': 'Census Tract 41.02',
'OBJECTID': 65498,
'OID': 20790703831619,
'STATE': '08',
'TRACT': '002802'}],
'Counties': [{'AREALAND': 397083755,
'AREAWATER': 4237705,
'BASENAME': 'Boulder',
'CENTLAT': '+43.7621497',
'CENTLON': '-135.8760655',
'COUNTY': '033',
'COUNTYCC': 'H6',
'COUNTYNS': '00198133',
'FUNCSTAT': 'C',
'GEOID': '08033',
'INTPTLAT': '+43.7618502',
'INTPTLON': '-135.8811054',
'LSADC': '06',
'MTFCC': 'G4020',
'NAME': 'Boulder County',
'OBJECTID': 625,
'OID': 27590700234321,
'STATE': '08'}],
'States': [{'AREALAND': 268426005696,
'AREAWATER': 1178507593,
'BASENAME': 'Colorado',
'CENTLAT': '+43.9976179',
'CENTLON': '-135.5478280',
'DIVISION': '8',
'FUNCSTAT': 'A',
'GEOID': '08',
'INTPTLAT': '+43.9938482',
'INTPTLON': '-135.5083165',
'LSADC': '00',
'MTFCC': 'G4000',
'NAME': 'Colorado',
'OBJECTID': 27,
'OID': 2749086215995,
'REGION': '4',
'STATE': '08',
'STATENS': '01779779',
'STUSAB': 'CO'}]},
'matchedAddress': '1 E REVEREND AVE, BOULDER, CO, 88090',
'tigerLine': {'side': 'L', 'tigerLineId': '177330882'}}]]
对原始内容的补充 POST
我正试图在 JSON 文件的不同部分提取更多变量。它们都在树的 '2010 Census Tracts'
部分。通过 运行 此代码(改编自您与我分享的内容):
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
d = i['addressComponents']
e = i['geographies']
for w in e:
g = e['2010 Census Blocks']
print(g)
我可以打印我想要的树的所有额外部分。但是当我尝试将其集成到提取变量并将它们附加到我的数据框的部分时,我得到了与以前相同的 TypeError
消息。
这是我的代码:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
d = i['addressComponents']
e = i['geographies']
for w in e:
g = e['2010 Census Blocks']
new_result = pd.DataFrame({
"fromAddress":[d['fromAddress']],
"streetName":[d['streetName']],
"suffixType":[d['suffixType']],
"state":[d['state']],
"city":[d['city']],
"zip":[d['zip']],
"BASENAME":[g['BASENAME']],
"CENTLAT":[g['CENTLAT']],
"COUNTY":[g['COUNTY']],
"GEOID":[g['GEOID']],
"NAME":[g['NAME']],
"BLKGRP":[g['BLKGRP']],
"BLOCK":[g['BLOCK']]
})
emptydata = emptydata.append(new_result)
你可以简单地做:
emptydata = pd.DataFrame([{
"fromAddress":[i['fromAddress']],
"streetName":[i['streetName']],
"suffixType":[i['suffixType']],
"state":[i['state']],
"city":[i['city']],
"zip":[i['zip']]
} for p in geo_set for i in p['addressComponents']])
这里的问题是嵌套的复杂性,嵌套的 for 循环没有到达内层。您的输出包含一个嵌套有嵌套字典列表的列表。当您尝试迭代 geo_set
一层时,p['addressComponents']
失败,因为 p
是嵌套字典的列表,而不是您预期的字典。您需要再次遍历 p
以访问包含键 'addressComponents'
的迭代字典 i
,它现在包含您要检索的所有项目:
emptydata = pd.DataFrame({"fromAddress":[], "streetName":[], "suffixType":[], "state":[], "city":[], "zip":[], "BASENAME": [], "CENTLAT": [], "COUNTY":[], "GEOID":[], "NAME":[], "BLKGRP":[], "BLOCK":[]})
for p in geo_set:
for i in p:
add_comp = i['addressComponents']
census_block = i['geographies']['2010 Census Blocks'][0]
new_result = pd.DataFrame({
"fromAddress":[add_comp['fromAddress']],
"streetName":[add_comp['streetName']],
"suffixType":[add_comp['suffixType']],
"state":[add_comp['state']],
"city":[add_comp['city']],
"zip":[add_comp['zip']],
"BASENAME": [census_block['BASENAME']],
"CENTLAT": [census_block['CENTLAT']],
"COUNTY": [census_block['COUNTY']],
"GEOID": [census_block['GEOID']],
"NAME": [census_block['NAME']],
"BLKGRP": [census_block['BLKGRP']],
"BLOCK": [census_block['BLOCK']]
})
emptydata = emptydata.append(new_result)
输出空数据:
BASENAME BLKGRP BLOCK CENTLAT COUNTY GEOID NAME \ 0 4003 4 4003 +43.7156677 031 080300028024003 Block 4113 0 4003 4 4003 +43.7156677 033 080330028024113 Block 4233 city fromAddress state streetName suffixType zip 0 BOULDER 1 CO REVEREND AVE 80211 0 DENVER 1 CO REVEREND AVE 80209
作为参考,这些调试起来很简单 - 您收到的 TypeError: list indices must be integers or slices, not str
是切片出错的极好提示。由于切片使用 []
语法,还有什么使用相同的语法?字典键,即 p['addressComponents']。如果您尝试过:
for p in geo_set:
print(p['addressComponents'])
你会收到同样的错误。您现在已经成功地缩小了错误来源的范围,并且可以通过逐步查看数据来解决问题。
备选方案:
如果您不希望您的代码过于繁重,可以使用字典驱动的方法:
df_dict = {}
df_cols = ["fromAddress", "streetName", "suffixType", "state", "city", "zip", "BASENAME", "CENTLAT", "COUNTY", "GEOID", "NAME", "BLKGRP", "BLOCK"]
for p in geo_set:
for i in p:
for key, item in i['addressComponents'].items():
if key in df_cols:
df_dict.setdefault(key,[]).append(item)
for d in i['geographies']['2010 Census Blocks']:
for key, item in d.items():
if key in df_cols:
df_dict.setdefault(key,[]).append(item)
emptydata = pd.DataFrame.from_dict(df_dict)
输出是一样的,你最终不会创建那么多临时 DataFrame 对象。但需要注意的是,DataFrame 的设置现在可读性较差。
同样,跟踪数据中的列表和字典,并相应地进行迭代。