按给定的 geojson 分割包含 lat/lng 的 pandas DataFrame
Segment a pandas DataFrame containing lat/lng by a given geojson
我有一个 DataFrame
包含列 lat
和 lng
。我还有 FeatureCollection
包含多边形的 geojson。鉴于此多边形,我如何才能以有效的方式仅分割给定多边形内的行 df
和 select?我想避免遍历 df
并手动检查每个元素。
d = {'lat' : [0,0.1,-0.1,0.4],
'lng' : [50,50.1,49.6,49.5]}
df = pd.DataFrame(d)
这是显示 1 个多边形和 4 个点的要素集合。如您所见,只有最后一点在外面。
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
0,
49
],
[
0.6,
50
],
[
0.1,
52
],
[
-1,
51
],
[
0,
49
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
50
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.1,
50.1
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
-0.1,
49.6
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.4,
49.5
]
}
}
]
}
this map 显示多边形和点。
编辑:
以下是我目前拥有的代码,但如您所料,它非常慢。
from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
polygon = shape(feature['geometry'])
for index, row in dfr.iterrows():
point = Point(row.location_lng, row.location_lat)
if polygon.contains(point):
print('Found containing polygon:', feature)
其中 dfr
是我的 DataFrame
,包含 location_lat
和 location_lng
。 feature_collection
是一个只有多边形的geojson Feature Collection(请注意,上面的geojson示例只是为了解释问题,它只有1个多边形并且有一些点来说明问题)。
假设你有你的数据框 dfr
比如:
location_lat location_lng
0 0.0 50.0
1 0.1 50.1
2 -0.1 49.6
3 0.4 49.5
和 feature_collection
包含多边形,例如:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
}
}]
}
我将第二个多边形中的 49 更改为 50 以删除其中的其他点。
您可以先用dfr
中的点创建一个列:
#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)
#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))
第二种方法在小数据帧上似乎更快,所以即使对于更大的数据帧,我也会使用这个方法。
现在您可以为 feature_collection
中的每个多边形创建一个列,其中包含该点是否属于该要素,我想通过在它们上循环:
from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))
然后 dfr
看起来像:
location_lat location_lng point feature_0 feature_1
0 0.0 50.0 POINT (0 50) True False
1 0.1 50.1 POINT (0.1 50.1) True True
2 -0.1 49.6 POINT (-0.1 49.6) True False
3 0.4 49.5 POINT (0.4 49.5) False False
到select哪个点属于一个特征,那么你做:
print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
location_lat location_lng
1 0.1 50.1
我有一个 DataFrame
包含列 lat
和 lng
。我还有 FeatureCollection
包含多边形的 geojson。鉴于此多边形,我如何才能以有效的方式仅分割给定多边形内的行 df
和 select?我想避免遍历 df
并手动检查每个元素。
d = {'lat' : [0,0.1,-0.1,0.4],
'lng' : [50,50.1,49.6,49.5]}
df = pd.DataFrame(d)
这是显示 1 个多边形和 4 个点的要素集合。如您所见,只有最后一点在外面。
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
0,
49
],
[
0.6,
50
],
[
0.1,
52
],
[
-1,
51
],
[
0,
49
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
50
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.1,
50.1
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
-0.1,
49.6
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.4,
49.5
]
}
}
]
}
this map 显示多边形和点。
编辑: 以下是我目前拥有的代码,但如您所料,它非常慢。
from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
polygon = shape(feature['geometry'])
for index, row in dfr.iterrows():
point = Point(row.location_lng, row.location_lat)
if polygon.contains(point):
print('Found containing polygon:', feature)
其中 dfr
是我的 DataFrame
,包含 location_lat
和 location_lng
。 feature_collection
是一个只有多边形的geojson Feature Collection(请注意,上面的geojson示例只是为了解释问题,它只有1个多边形并且有一些点来说明问题)。
假设你有你的数据框 dfr
比如:
location_lat location_lng
0 0.0 50.0
1 0.1 50.1
2 -0.1 49.6
3 0.4 49.5
和 feature_collection
包含多边形,例如:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
}
}]
}
我将第二个多边形中的 49 更改为 50 以删除其中的其他点。
您可以先用dfr
中的点创建一个列:
#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)
#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))
第二种方法在小数据帧上似乎更快,所以即使对于更大的数据帧,我也会使用这个方法。
现在您可以为 feature_collection
中的每个多边形创建一个列,其中包含该点是否属于该要素,我想通过在它们上循环:
from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))
然后 dfr
看起来像:
location_lat location_lng point feature_0 feature_1
0 0.0 50.0 POINT (0 50) True False
1 0.1 50.1 POINT (0.1 50.1) True True
2 -0.1 49.6 POINT (-0.1 49.6) True False
3 0.4 49.5 POINT (0.4 49.5) False False
到select哪个点属于一个特征,那么你做:
print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
location_lat location_lng
1 0.1 50.1