如何循环遍历嵌套列表以将值存储在数据框中？

Question

给定一个嵌套字典 neighborhood_data 并且第一项即 neighborhood_data[0] 显示

{'type': 'Feature',
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[28.073783, -26.343133],
     [28.071239, -26.351536],
     [28.068717, -26.350644],
     [28.06663, -26.351362],
     [28.065161, -26.352135],
     [28.064671, -26.35399]]]],
'properties': {'cartodb_id': 1,
  'subplace_c': 761001001,
  'province': 'Gauteng',
  'wardid': '74202012',
  'district_m': 'Sedibeng',
  'local_muni': 'Midvaal',
  'main_place': 'Alberton',
  'mp_class': 'Settlement',
  'sp_name': 'Brenkondown',
  'suburb_nam': 'Brenkondown',
  'metro': 'Johannesburg',
  'african': 330,
  'white': 24,
  'asian': 0,
  'coloured': 2,
  'other': 0,
  'totalpop': 356}}}

然后我创建了一个空数据框neighborhoods

# define the dataframe columns
column_names = ['Province', 'District', 'Local_municipality','Main Place', 'Suburb','Metro','Latitude','Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

然而，当我循环 neighborhoods_data 将相关数据存储在 neighborhoods 数据框中时，出现以下错误

for data in neighborhood_data:
    province = data['properties']['province']
    district = data['properties']['district_m']
    local_muni_name = suburb_name = data['properties']['local_muni'] 
    suburb_name = data['properties']['suburb_nam']
    metro = data['properties']['metro']
    
    suburb_latlon = data['geometry']['coordinates']
    subur_lat = suburb_latlon[[[[1]]]]
    suburb_lon = suburb_latlon[[[[0]]]]
    
    neighborhoods = neighborhoods.append({'Province': province,
                                          'District': district,
                                          'Local_municipality': local_muni_name,
                                          'Main place': main_place,
                                          'Suburb': suburb_name,
                                          'Metro': metro,
                                          'Latitude': suburb_lat,
                                          'Longitude': suburb_lon}, ignore_index=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a5dc74ed4207> in <module>
      7 
      8     suburb_latlon = data['geometry']['coordinates']
----> 9     subur_lat = suburb_latlon[[[[1]]]]
     10     suburb_lon = suburb_latlon[[[[0]]]]
     11 

TypeError: list indices must be integers or slices, not list

那么如何在空数据框的 'Latitude' 和 'Longitude' 列中存储纬度和经度坐标？

Answer 1

您的字典格式错误，它在 coordinates 键中缺少右方括号，但我们假设这是正确的字典：

{'geometry': {'coordinates': [[[[28.073783, -26.343133],
     [28.071239, -26.351536],
     [28.068717, -26.350644],
     [28.06663, -26.351362],
     [28.065161, -26.352135],
     [28.064671, -26.35399]]]],
  'properties': {'african': 330,
   'asian': 0,
   'cartodb_id': 1,
   'coloured': 2,
   'district_m': 'Sedibeng',
   'local_muni': 'Midvaal',
   'main_place': 'Alberton',
   'metro': 'Johannesburg',
   'mp_class': 'Settlement',
   'other': 0,
   'province': 'Gauteng',
   'sp_name': 'Brenkondown',
   'subplace_c': 761001001,
   'suburb_nam': 'Brenkondown',
   'totalpop': 356,
   'wardid': '74202012',
   'white': 24},
  'type': 'MultiPolygon'},
 'type': 'Feature'}

然后，访问

suburb_latlon = data['geometry']['coordinates']
subur_lat = suburb_latlon[[[[1]]]] # <--- Indexing error here
suburb_lon = suburb_latlon[[[[0]]]] # <--- Indexing error here

我们想要执行以下操作（通过额外的列表级别解压缩直到我们得到我们的坐标）：

suburb_latlon = data['geometry']['coordinates']
subur_lat = suburb_latlon[0][0][0][1] # <--- Not sure what your logic is here, and why you would pick the first one, but I'll assume that given this indexing procedure you can customize this.
suburb_lon = suburb_latlon[0][0][0][0] # <--- Same here

Answer 2

我认为您的坐标格式不正确。您目前有三个方括号打开，但 none 关闭：

'coordinates': [[[[28.073783, -26.343133]... [28.064671, -26.35399],

如果您想保留当前格式，您需要确保您的数据在末尾缺少三个方括号 ']]]' 或删除开头的两个方括号并按如下格式设置您的坐标：

'coordinates' : [[28.073783, -26.343133], [28.071239, -26.351536]...]

然后您可以使用以下方式访问：

suburb_latlon = data['geometry']['coordinates']
subur_lat = suburb_latlon[[0][1]]
suburb_lon = suburb_latlon[[0][0]]

访问第一个列表项 [28.073783, -26.343133]，然后将 lat 分配给该列表中的第二个元素，并将 lon 分配给该列表中的第一个项目。

如何循环遍历嵌套列表以将值存储在数据框中？

How can I loop through a nested list to store the values in a data frame?

python

nested-lists

geojson

pandas