在geopandas中使用缓冲区组合两个数据帧
Combining two dataframes using buffer in geopandas
我有两个数据框(不是精确数据但相似):
df1:
Lon
Lat
Timestamp
4.44
61.41
2021-04-28 00:00:00
4.48
62.45
2021-04-28 00:02:00
4.51
61.48
2021-04-28 00:06:00
4.47
62.46
2021-04-28 00:08:00
4.44
61.41
2021-04-28 00:10:00
4.40
62.48
2021-04-28 00:12:00
4.51
61.44
2021-04-28 00:16:00
4.47
62.49
2021-04-28 00:18:00
df2
Lon
Lat
Timestamp
4.34
61.41
2021-04-28 00:00:00
4.38
62.45
2021-04-28 00:02:00
4.31
61.48
2021-04-28 00:06:00
4.17
62.46
2021-04-28 00:08:00
4.34
61.41
2021-04-28 00:10:00
4.30
62.48
2021-04-28 00:12:00
4.21
61.44
2021-04-28 00:16:00
4.47
62.49
2021-04-28 00:18:00
还有其他专栏,但我的问题与这些专栏有关。
所以我想在 df1.
中的每一次观察中,在 100m 半径内每分钟合并两个数据帧
我只用一个数据框做了类似的事情,对于数据框中的每个观察,我都加入了 100 米半径内的所有观察。
for name, group in df.groupby(['timestamp']):
buf = group.copy()
buf['geometry'] = buf.geometry.buffer(100)
points_within = gpd.sjoin(group, buf, op = 'within')
我需要做一些类似的事情,但是有两个数据帧
- 示例数据集中100m以内的内容不多。增加距离意味着更多
sjoin()
- 将 GeoPandas 功能与 CRS 和
buffer()
结合使用。将 UTM 几何用于距离很重要。因此投影到 UTM 并返回到 EPSG:4326
- 已显示输出数据帧加上 plotly mapbox 作为标记和包含缓冲区的 geojson
import geopandas as gpd
import shapely, json
import pandas as pd
import plotly.express as px
df1 = pd.DataFrame(
{
"Lon": [4.44, 4.48, 4.51, 4.47, 4.44, 4.4, 4.51, 4.47],
"Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
"Timestamp": [
"2021-04-28 00:00:00",
"2021-04-28 00:02:00",
"2021-04-28 00:06:00",
"2021-04-28 00:08:00",
"2021-04-28 00:10:00",
"2021-04-28 00:12:00",
"2021-04-28 00:16:00",
"2021-04-28 00:18:00",
],
}
)
df2 = pd.DataFrame(
{
"Lon": [4.34, 4.38, 4.31, 4.17, 4.34, 4.3, 4.21, 4.47],
"Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
"Timestamp": [
"2021-04-28 00:00:00",
"2021-04-28 00:02:00",
"2021-04-28 00:06:00",
"2021-04-28 00:08:00",
"2021-04-28 00:10:00",
"2021-04-28 00:12:00",
"2021-04-28 00:16:00",
"2021-04-28 00:18:00",
],
}
)
MIN_DIST = 10**2
gdf1 = gpd.GeoDataFrame(
geometry=df1.loc[:, ["Lon", "Lat"]]
.apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
.values,
crs="EPSG:4326",
)
gdf2 = gpd.GeoDataFrame(
geometry=df2.loc[:, ["Lon", "Lat"]]
.apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
.values,
crs="EPSG:4326",
)
# add buffer to df1, NB need to correctly use CRS systems to define distances
gdf1 = (
gdf1.to_crs(gdf1.estimate_utm_crs()).geometry.buffer(MIN_DIST).to_crs("EPSG:4326")
)
# join data frames back together
df2_in_df1 = df2.reset_index().merge(
gpd.sjoin(gpd.GeoDataFrame(geometry=gdf1), gdf2, how="inner"),
left_on="index",
right_on="index_right",
)
# plot it to see what's been found
fig = (
px.scatter_mapbox(df1, lat="Lat", lon="Lon")
.update_traces(marker={"color": "red", "opacity":.3})
.add_traces(px.scatter_mapbox(df2, lat="Lat", lon="Lon").update_traces(marker={"color":"red", "opacity":.3}).data)
.add_traces(px.scatter_mapbox(df2_in_df1, lat="Lat", lon="Lon").update_traces(marker={"color":"green", "size":10}).data)
)
fig.update_layout(
mapbox={
"style": "open-street-map",
"layers": [
{
"source": json.loads(gdf1.geometry.to_json()),
"below": "traces",
"type": "line",
"color": "purple",
"line": {"width": 1.5},
}
],
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
index
Lon
Lat
Timestamp
index_right
0
7
4.47
62.49
2021-04-28 00:18:00
7
我有两个数据框(不是精确数据但相似): df1:
Lon | Lat | Timestamp |
---|---|---|
4.44 | 61.41 | 2021-04-28 00:00:00 |
4.48 | 62.45 | 2021-04-28 00:02:00 |
4.51 | 61.48 | 2021-04-28 00:06:00 |
4.47 | 62.46 | 2021-04-28 00:08:00 |
4.44 | 61.41 | 2021-04-28 00:10:00 |
4.40 | 62.48 | 2021-04-28 00:12:00 |
4.51 | 61.44 | 2021-04-28 00:16:00 |
4.47 | 62.49 | 2021-04-28 00:18:00 |
df2
Lon | Lat | Timestamp |
---|---|---|
4.34 | 61.41 | 2021-04-28 00:00:00 |
4.38 | 62.45 | 2021-04-28 00:02:00 |
4.31 | 61.48 | 2021-04-28 00:06:00 |
4.17 | 62.46 | 2021-04-28 00:08:00 |
4.34 | 61.41 | 2021-04-28 00:10:00 |
4.30 | 62.48 | 2021-04-28 00:12:00 |
4.21 | 61.44 | 2021-04-28 00:16:00 |
4.47 | 62.49 | 2021-04-28 00:18:00 |
还有其他专栏,但我的问题与这些专栏有关。 所以我想在 df1.
中的每一次观察中,在 100m 半径内每分钟合并两个数据帧我只用一个数据框做了类似的事情,对于数据框中的每个观察,我都加入了 100 米半径内的所有观察。
for name, group in df.groupby(['timestamp']):
buf = group.copy()
buf['geometry'] = buf.geometry.buffer(100)
points_within = gpd.sjoin(group, buf, op = 'within')
我需要做一些类似的事情,但是有两个数据帧
- 示例数据集中100m以内的内容不多。增加距离意味着更多
sjoin()
- 将 GeoPandas 功能与 CRS 和
buffer()
结合使用。将 UTM 几何用于距离很重要。因此投影到 UTM 并返回到 EPSG:4326 - 已显示输出数据帧加上 plotly mapbox 作为标记和包含缓冲区的 geojson
import geopandas as gpd
import shapely, json
import pandas as pd
import plotly.express as px
df1 = pd.DataFrame(
{
"Lon": [4.44, 4.48, 4.51, 4.47, 4.44, 4.4, 4.51, 4.47],
"Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
"Timestamp": [
"2021-04-28 00:00:00",
"2021-04-28 00:02:00",
"2021-04-28 00:06:00",
"2021-04-28 00:08:00",
"2021-04-28 00:10:00",
"2021-04-28 00:12:00",
"2021-04-28 00:16:00",
"2021-04-28 00:18:00",
],
}
)
df2 = pd.DataFrame(
{
"Lon": [4.34, 4.38, 4.31, 4.17, 4.34, 4.3, 4.21, 4.47],
"Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
"Timestamp": [
"2021-04-28 00:00:00",
"2021-04-28 00:02:00",
"2021-04-28 00:06:00",
"2021-04-28 00:08:00",
"2021-04-28 00:10:00",
"2021-04-28 00:12:00",
"2021-04-28 00:16:00",
"2021-04-28 00:18:00",
],
}
)
MIN_DIST = 10**2
gdf1 = gpd.GeoDataFrame(
geometry=df1.loc[:, ["Lon", "Lat"]]
.apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
.values,
crs="EPSG:4326",
)
gdf2 = gpd.GeoDataFrame(
geometry=df2.loc[:, ["Lon", "Lat"]]
.apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
.values,
crs="EPSG:4326",
)
# add buffer to df1, NB need to correctly use CRS systems to define distances
gdf1 = (
gdf1.to_crs(gdf1.estimate_utm_crs()).geometry.buffer(MIN_DIST).to_crs("EPSG:4326")
)
# join data frames back together
df2_in_df1 = df2.reset_index().merge(
gpd.sjoin(gpd.GeoDataFrame(geometry=gdf1), gdf2, how="inner"),
left_on="index",
right_on="index_right",
)
# plot it to see what's been found
fig = (
px.scatter_mapbox(df1, lat="Lat", lon="Lon")
.update_traces(marker={"color": "red", "opacity":.3})
.add_traces(px.scatter_mapbox(df2, lat="Lat", lon="Lon").update_traces(marker={"color":"red", "opacity":.3}).data)
.add_traces(px.scatter_mapbox(df2_in_df1, lat="Lat", lon="Lon").update_traces(marker={"color":"green", "size":10}).data)
)
fig.update_layout(
mapbox={
"style": "open-street-map",
"layers": [
{
"source": json.loads(gdf1.geometry.to_json()),
"below": "traces",
"type": "line",
"color": "purple",
"line": {"width": 1.5},
}
],
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
index | Lon | Lat | Timestamp | index_right | |
---|---|---|---|---|---|
0 | 7 | 4.47 | 62.49 | 2021-04-28 00:18:00 | 7 |