如何使用 geopandas 对 shapefile 中的特征进行迭代连接,然后对分类数据进行编码?
How can I do a sjoin iteratively over features in a shapefile with geopandas, then encode categorical data?
我有两个 shapefile (https://drive.google.com/drive/folders/1pbvKvhIIvhqHfcMe9g6qtsjbZ6SzZrqt?usp=sharing) - 一个点层和一个多边形层。点层表示客户及其位置,而多边形层表示两个边界。 objective是得到一个table,格式如下:
customer
location 1
location 2
1
1
1
2
0
1
3
1
1
5
1
0
6
1
0
9
0
0
10
0
0
我想到的方法是遍历多边形并对点进行连接,然后对类别进行编码:
import geopandas as gpd
points = gpd.read_file('point.shp')
polygons = gpd.read_file('polygon.shp')
for index,row in polygons.iterrows():
points = gpd.sjoin(points, row, how='left', op='intersects')
points = pd.get_dummies(points, columns=['name'])
我收到此错误消息:
ValueError: 'right_df' should be GeoDataFrame, got <class 'pandas.core.series.Series'>
感谢任何建议,提前致谢!
你不需要连接,intersects
方法就足够了。您的目标结构可以通过以下方式实现:
points_in_locations = points.copy()
for idx, row in polygons.iterrows():
is_in_polygon = points.intersects(row.geometry)
points_in_locations[f"location {idx + 1}"] = is_in_polygon.astype(int)
导致:
id geometry location 1 location 2
0 1 POINT (103.87728 1.30449) 0 1
1 2 POINT (103.87723 1.30415) 0 1
2 3 POINT (103.87761 1.30408) 0 1
3 1 POINT (103.87680 1.30287) 1 0
4 5 POINT (103.87724 1.30288) 1 0
5 6 POINT (103.87710 1.30275) 1 0
6 3 POINT (103.87687 1.30270) 1 0
7 9 POINT (103.87669 1.30444) 0 0
8 10 POINT (103.87681 1.30396) 0 0
我有两个 shapefile (https://drive.google.com/drive/folders/1pbvKvhIIvhqHfcMe9g6qtsjbZ6SzZrqt?usp=sharing) - 一个点层和一个多边形层。点层表示客户及其位置,而多边形层表示两个边界。 objective是得到一个table,格式如下:
customer | location 1 | location 2 |
---|---|---|
1 | 1 | 1 |
2 | 0 | 1 |
3 | 1 | 1 |
5 | 1 | 0 |
6 | 1 | 0 |
9 | 0 | 0 |
10 | 0 | 0 |
我想到的方法是遍历多边形并对点进行连接,然后对类别进行编码:
import geopandas as gpd
points = gpd.read_file('point.shp')
polygons = gpd.read_file('polygon.shp')
for index,row in polygons.iterrows():
points = gpd.sjoin(points, row, how='left', op='intersects')
points = pd.get_dummies(points, columns=['name'])
我收到此错误消息:
ValueError: 'right_df' should be GeoDataFrame, got <class 'pandas.core.series.Series'>
感谢任何建议,提前致谢!
你不需要连接,intersects
方法就足够了。您的目标结构可以通过以下方式实现:
points_in_locations = points.copy()
for idx, row in polygons.iterrows():
is_in_polygon = points.intersects(row.geometry)
points_in_locations[f"location {idx + 1}"] = is_in_polygon.astype(int)
导致:
id geometry location 1 location 2
0 1 POINT (103.87728 1.30449) 0 1
1 2 POINT (103.87723 1.30415) 0 1
2 3 POINT (103.87761 1.30408) 0 1
3 1 POINT (103.87680 1.30287) 1 0
4 5 POINT (103.87724 1.30288) 1 0
5 6 POINT (103.87710 1.30275) 1 0
6 3 POINT (103.87687 1.30270) 1 0
7 9 POINT (103.87669 1.30444) 0 0
8 10 POINT (103.87681 1.30396) 0 0