加入两个数据集,都带有点
Join two datasets, both with points
我有两个带点的 csv 文件。一个学校数据集(纬度、经度和学校名称)和一个带有房屋坐标(纬度、经度和 houseid)的数据集。
我想列出距离学校 500 米范围内的所有房屋。
我真的不知道如何在 Python 中与 geopandas 进行空间连接。有人可以帮我吗?
schools.csv
56.039484;14.164114;Parkskolan
56.029687;14.159337;Centralskolan
houses.csv
56.039240;14.165066;1
56.039008;14.166709;2
56.038608;14.169420;3
获得解决方案的主要步骤:
- 将2个数据文件读入dataframes
- 设置 CRS('epsg:4326') 并从 (lat,long) 为两个数据帧创建点几何
- 对于
schools
数据帧,将 CRS 转换为 UTMzone 33N
- 对
schools
数据帧进行缓冲(半径=500m)
- 在
schools
数据帧上,执行并将 500m 缓冲区设置为新 geometry
- 在公共 CRS
中 houses
和 schools
之间进行适当的空间连接
- 在
houses_joined
数据帧中获取结果
这是工作代码:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
# School data
# -----------
# read `schools.csv`, data are in (lat,long); 'epsg:4326'
#
# lat;lon;school_name
# 56.039484;14.164114;Parkskolan
# 56.029687;14.159337;Centralskolan
df_schools = pd.read_csv('schools.csv', na_values=['NaN'], sep=';')
# create Point geometry objects from (lon,lat)
sch_geom = [Point(xy) for xy in zip(df_schools.lon, df_schools.lat)]
# set initial coordinate ref system, and geometry column to the dataframe
gdf_schools = gpd.GeoDataFrame(df_schools, crs={'init': 'epsg:4326'}, geometry=sch_geom)
# convert CRS from (lat,long) to UTMzone 33N
# and get new dataframe: gdf_schools_utm33N
gdf_schools_utm33N = gdf_schools.to_crs(crs="+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
# Note: crs="..." can be replaced by epsg=32633
# do buffering, radius: 500m
gdf_schools_utm33N['buffer_geometry'] = gdf_schools_utm33N.geometry.buffer(500)
# rename `geometry` -> `original_geometry`; `buffer_geometry` -> geometry
# .. and set column `geometry` as the default geometry data of the geodataframe.
gdf_schools_utm33N = gdf_schools_utm33N.rename(
columns={'geometry':'original_geometry', 'buffer_geometry':'geometry'}).set_geometry('geometry')
# Houses data
# -----------
# read `houses.csv`, data are in (lat,long); 'epsg:4326'
# lat;lon;houseid
# 56.039240;14.165066;1
# 56.039008;14.166709;2
# 56.038608;14.169420;3
# 56.046108;14.171420;4
df_houses = pd.read_csv('houses.csv', na_values=['NaN'], sep=';')
# I add the 4th house that is too far away from all schools
# The 4th house: 56.046108 14.171420 4
# create Point geometry for the houses, and init CRS
hs_geom = [Point(xy) for xy in zip(df_houses.lon, df_houses.lat)]
gdf_houses = gpd.GeoDataFrame(df_houses, crs={'init': 'epsg:4326'}, geometry=hs_geom)
# options: plot the schools' buffers and all the houses
ax = gdf_schools_utm33N.plot(color='lightgray', edgecolor='green', alpha=0.5)
gdf_houses.to_crs(epsg=32633).plot(ax=ax, color='red')
# ******* Spatial Join *****************
# houses data frame needs CRS conversion
hss = gdf_houses.to_crs(epsg=32633)
# do spatial join of houses(points) ~ schools(circles of 500m radius)
houses_joined = gpd.sjoin(hss, gdf_schools_utm33N, op='within', how='inner')
# print out the successful joined rows (house_id, school_names)
# this prints house_id + school_name
houses_joined[['houseid','school_name']]
# Output: house_id, school_name
# 1 Parkskolan
# 2 Parkskolan
# 3 Parkskolan
结果图:
我有两个带点的 csv 文件。一个学校数据集(纬度、经度和学校名称)和一个带有房屋坐标(纬度、经度和 houseid)的数据集。
我想列出距离学校 500 米范围内的所有房屋。
我真的不知道如何在 Python 中与 geopandas 进行空间连接。有人可以帮我吗?
schools.csv
56.039484;14.164114;Parkskolan
56.029687;14.159337;Centralskolan
houses.csv
56.039240;14.165066;1
56.039008;14.166709;2
56.038608;14.169420;3
获得解决方案的主要步骤:
- 将2个数据文件读入dataframes
- 设置 CRS('epsg:4326') 并从 (lat,long) 为两个数据帧创建点几何
- 对于
schools
数据帧,将 CRS 转换为 UTMzone 33N - 对
schools
数据帧进行缓冲(半径=500m) - 在
schools
数据帧上,执行并将 500m 缓冲区设置为新geometry
- 在公共 CRS 中
- 在
houses_joined
数据帧中获取结果
houses
和 schools
之间进行适当的空间连接
这是工作代码:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
# School data
# -----------
# read `schools.csv`, data are in (lat,long); 'epsg:4326'
#
# lat;lon;school_name
# 56.039484;14.164114;Parkskolan
# 56.029687;14.159337;Centralskolan
df_schools = pd.read_csv('schools.csv', na_values=['NaN'], sep=';')
# create Point geometry objects from (lon,lat)
sch_geom = [Point(xy) for xy in zip(df_schools.lon, df_schools.lat)]
# set initial coordinate ref system, and geometry column to the dataframe
gdf_schools = gpd.GeoDataFrame(df_schools, crs={'init': 'epsg:4326'}, geometry=sch_geom)
# convert CRS from (lat,long) to UTMzone 33N
# and get new dataframe: gdf_schools_utm33N
gdf_schools_utm33N = gdf_schools.to_crs(crs="+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
# Note: crs="..." can be replaced by epsg=32633
# do buffering, radius: 500m
gdf_schools_utm33N['buffer_geometry'] = gdf_schools_utm33N.geometry.buffer(500)
# rename `geometry` -> `original_geometry`; `buffer_geometry` -> geometry
# .. and set column `geometry` as the default geometry data of the geodataframe.
gdf_schools_utm33N = gdf_schools_utm33N.rename(
columns={'geometry':'original_geometry', 'buffer_geometry':'geometry'}).set_geometry('geometry')
# Houses data
# -----------
# read `houses.csv`, data are in (lat,long); 'epsg:4326'
# lat;lon;houseid
# 56.039240;14.165066;1
# 56.039008;14.166709;2
# 56.038608;14.169420;3
# 56.046108;14.171420;4
df_houses = pd.read_csv('houses.csv', na_values=['NaN'], sep=';')
# I add the 4th house that is too far away from all schools
# The 4th house: 56.046108 14.171420 4
# create Point geometry for the houses, and init CRS
hs_geom = [Point(xy) for xy in zip(df_houses.lon, df_houses.lat)]
gdf_houses = gpd.GeoDataFrame(df_houses, crs={'init': 'epsg:4326'}, geometry=hs_geom)
# options: plot the schools' buffers and all the houses
ax = gdf_schools_utm33N.plot(color='lightgray', edgecolor='green', alpha=0.5)
gdf_houses.to_crs(epsg=32633).plot(ax=ax, color='red')
# ******* Spatial Join *****************
# houses data frame needs CRS conversion
hss = gdf_houses.to_crs(epsg=32633)
# do spatial join of houses(points) ~ schools(circles of 500m radius)
houses_joined = gpd.sjoin(hss, gdf_schools_utm33N, op='within', how='inner')
# print out the successful joined rows (house_id, school_names)
# this prints house_id + school_name
houses_joined[['houseid','school_name']]
# Output: house_id, school_name
# 1 Parkskolan
# 2 Parkskolan
# 3 Parkskolan
结果图: