在geopandas中使用缓冲区组合两个数据帧

Combining two dataframes using buffer in geopandas

我有两个数据框(不是精确数据但相似): df1:

Lon Lat Timestamp
4.44 61.41 2021-04-28 00:00:00
4.48 62.45 2021-04-28 00:02:00
4.51 61.48 2021-04-28 00:06:00
4.47 62.46 2021-04-28 00:08:00
4.44 61.41 2021-04-28 00:10:00
4.40 62.48 2021-04-28 00:12:00
4.51 61.44 2021-04-28 00:16:00
4.47 62.49 2021-04-28 00:18:00

df2

Lon Lat Timestamp
4.34 61.41 2021-04-28 00:00:00
4.38 62.45 2021-04-28 00:02:00
4.31 61.48 2021-04-28 00:06:00
4.17 62.46 2021-04-28 00:08:00
4.34 61.41 2021-04-28 00:10:00
4.30 62.48 2021-04-28 00:12:00
4.21 61.44 2021-04-28 00:16:00
4.47 62.49 2021-04-28 00:18:00

还有其他专栏,但我的问题与这些专栏有关。 所以我想在 df1.

中的每一次观察中,在 100m 半径内每分钟合并两个数据帧

我只用一个数据框做了类似的事情,对于数据框中的每个观察,我都加入了 100 米半径内的所有观察。

for name, group in df.groupby(['timestamp']):
        buf = group.copy()
        buf['geometry'] = buf.geometry.buffer(100)
        points_within = gpd.sjoin(group, buf,   op = 'within')

我需要做一些类似的事情,但是有两个数据帧

  • 示例数据集中100m以内的内容不多。增加距离意味着更多 sjoin()
  • GeoPandas 功能与 CRS 和 buffer() 结合使用。将 UTM 几何用于距离很重要。因此投影到 UTM 并返回到 EPSG:4326
  • 已显示输出数据帧加上 plotly mapbox 作为标记和包含缓冲区的 geojson
import geopandas as gpd
import shapely, json
import pandas as pd
import plotly.express as px

df1 = pd.DataFrame(
    {
        "Lon": [4.44, 4.48, 4.51, 4.47, 4.44, 4.4, 4.51, 4.47],
        "Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
        "Timestamp": [
            "2021-04-28 00:00:00",
            "2021-04-28 00:02:00",
            "2021-04-28 00:06:00",
            "2021-04-28 00:08:00",
            "2021-04-28 00:10:00",
            "2021-04-28 00:12:00",
            "2021-04-28 00:16:00",
            "2021-04-28 00:18:00",
        ],
    }
)

df2 = pd.DataFrame(
    {
        "Lon": [4.34, 4.38, 4.31, 4.17, 4.34, 4.3, 4.21, 4.47],
        "Lat": [61.41, 62.45, 61.48, 62.46, 61.41, 62.48, 61.44, 62.49],
        "Timestamp": [
            "2021-04-28 00:00:00",
            "2021-04-28 00:02:00",
            "2021-04-28 00:06:00",
            "2021-04-28 00:08:00",
            "2021-04-28 00:10:00",
            "2021-04-28 00:12:00",
            "2021-04-28 00:16:00",
            "2021-04-28 00:18:00",
        ],
    }
)

MIN_DIST = 10**2

gdf1 = gpd.GeoDataFrame(
    geometry=df1.loc[:, ["Lon", "Lat"]]
    .apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
    .values,
    crs="EPSG:4326",
)

gdf2 = gpd.GeoDataFrame(
    geometry=df2.loc[:, ["Lon", "Lat"]]
    .apply(lambda r: shapely.geometry.Point(r["Lon"], r["Lat"]), axis=1)
    .values,
    crs="EPSG:4326",
)

# add buffer to df1,  NB need to correctly use CRS systems to define distances
gdf1 = (
    gdf1.to_crs(gdf1.estimate_utm_crs()).geometry.buffer(MIN_DIST).to_crs("EPSG:4326")
)

# join data frames back together
df2_in_df1 = df2.reset_index().merge(
    gpd.sjoin(gpd.GeoDataFrame(geometry=gdf1), gdf2, how="inner"),
    left_on="index",
    right_on="index_right",
)


# plot it to see what's been found
fig = (
    px.scatter_mapbox(df1, lat="Lat", lon="Lon")
    .update_traces(marker={"color": "red", "opacity":.3})
    .add_traces(px.scatter_mapbox(df2, lat="Lat", lon="Lon").update_traces(marker={"color":"red", "opacity":.3}).data)
    .add_traces(px.scatter_mapbox(df2_in_df1, lat="Lat", lon="Lon").update_traces(marker={"color":"green", "size":10}).data)

    )

fig.update_layout(
    mapbox={
        "style": "open-street-map",
        "layers": [
            {
                "source": json.loads(gdf1.geometry.to_json()),
                "below": "traces",
                "type": "line",
                "color": "purple",
                "line": {"width": 1.5},
            }
        ],
    },
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
index Lon Lat Timestamp index_right
0 7 4.47 62.49 2021-04-28 00:18:00 7