按给定的 geojson 分割包含 lat/lng 的 pandas DataFrame

Question

我有一个 DataFrame 包含列 lat 和 lng。我还有 FeatureCollection 包含多边形的 geojson。鉴于此多边形，我如何才能以有效的方式仅分割给定多边形内的行 df 和 select？我想避免遍历 df 并手动检查每个元素。

d = {'lat' : [0,0.1,-0.1,0.4],
    'lng' : [50,50.1,49.6,49.5]}


df = pd.DataFrame(d)

这是显示 1 个多边形和 4 个点的要素集合。如您所见，只有最后一点在外面。

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              0,
              49
            ],
            [
              0.6,
              50
            ],
            [
              0.1,
              52
            ],
            [
              -1,
              51
            ],
            [
              0,
              49
            ]
          ]
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0,
          50
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0.1,
          50.1
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          -0.1,
          49.6
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0.4,
          49.5
        ]
      }
    }
  ]
}

this map 显示多边形和点。

编辑：以下是我目前拥有的代码，但如您所料，它非常慢。

from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
    polygon = shape(feature['geometry'])
    for index, row in dfr.iterrows():
        point = Point(row.location_lng, row.location_lat)
        if polygon.contains(point):
            print('Found containing polygon:', feature)

其中 dfr 是我的 DataFrame，包含 location_lat 和 location_lng。 feature_collection 是一个只有多边形的geojson Feature Collection（请注意，上面的geojson示例只是为了解释问题，它只有1个多边形并且有一些点来说明问题）。

Answer 1

假设你有你的数据框 dfr 比如：

   location_lat  location_lng
0           0.0          50.0
1           0.1          50.1
2          -0.1          49.6
3           0.4          49.5

和 feature_collection 包含多边形，例如：

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
      }
    }]
}

我将第二个多边形中的 49 更改为 50 以删除其中的其他点。

您可以先用dfr中的点创建一个列：

#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)

#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))

第二种方法在小数据帧上似乎更快，所以即使对于更大的数据帧，我也会使用这个方法。

现在您可以为 feature_collection 中的每个多边形创建一个列，其中包含该点是否属于该要素，我想通过在它们上循环：

from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
    dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))

然后 dfr 看起来像：

   location_lat  location_lng              point  feature_0  feature_1
0           0.0          50.0       POINT (0 50)       True      False
1           0.1          50.1   POINT (0.1 50.1)       True       True
2          -0.1          49.6  POINT (-0.1 49.6)       True      False
3           0.4          49.5   POINT (0.4 49.5)      False      False

到select哪个点属于一个特征，那么你做：

print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
   location_lat  location_lng
1           0.1          50.1

按给定的 geojson 分割包含 lat/lng 的 pandas DataFrame

Segment a pandas DataFrame containing lat/lng by a given geojson

python

geojson

pandas