如何对比较来自两个不同数据帧的形状对象的函数进行矢量化？

Question

我有一个 pandas 数据框和一个 geopandas 数据框。在 Pandas 数据框中，我有一列 Points，其中包含 shapely.geometry Point 个对象。 geopandas 框架中的 geometry 列有 Polygon 个对象。我想做的是在 Pandas 框架中取一个 Point 并测试它是否是 [=13] 的 within any =] geopandas 框架中的对象。

在 pandas 框架的新专栏中，我想要以下内容。如果 Point 在给定的 Polygon 内（即 within 调用 returns True），我希望新列的值位于 Point的行是 geopandas 框架中 Polygon 行中不同列的值。

我有一个解决这个问题的有效方法，但它没有向量化。是否可以对其进行矢量化？

示例：

import geopandas as gpd
import pandas as pd
from shapely.geometry import Point, Polygon

# Create random frame, geometries are supposed to be mutually exclusive
gdf = gpd.GeoDataFrame({'A': [1, 2], 'geometry': [Polygon([(10, 5), (5, 6)]), Polygon([(1,2), (2, 5))]})

# Create random pandas
df = pd.DataFrame({'Foo': ['bar', 'Bar'], 'Points': [Point(4, 5), Point(1, 2)]})

# My non-vectorized solution
df['new'] = ''
for i in df.index:
    for j in gdf.index:
        if df.at[i, 'Points'].within(gdf.at[j, 'geometry']):
            df.at[i, 'new'] = gdf.at[j, 'A']

这很好用，因此当点在多边形内时，df['new'] 将包含列 gdf['A'] 中的任何内容。我希望有一种方法可以让我对这个操作进行向量化处理。

Answer 1

我找到了适合我的目的的解决方案。不是最优雅的，但仍然比循环快得多。

def within_vectorized(array, point):
# Create array of False and True values 
    _array = np.array([point.within(p) for p in array])
# When the first element of np.where tuple is not empty
    if np.where(_array)[0].size != 0:
        return np.where(_array)[0][0]
    else:
        return -1

# Create dummy value row geopandas frame
# This will have an empty Polygon object in the geometry column and NaN's everywhere else
dummy_values = np.empty((1, gdf.shape[1]))
dummy_values[:] = np.nan
dummy_values = dummy_values.tolist()[0]
dummy_values[-1] = Polygon()
gdf.loc[-1] = dummy_values

# Use loc where index is retrieved by calling vectorized function
df['A'] = gdf.loc[df['Point'].apply(lambda x: within_vectorized(gdf['geometry'], x)), 'A'].to_list()

Answer 2

您可以计算Points和Polygon所有点之间的欧式距离。而且，只要距离等于 0，这就会给你一个交点。我的方法如下。请注意，我将从您的数据框中获取所有点和多边形点的部分留给您。可能，像 pandas.Series.toList 这样的函数应该提供那个。

import numpy as np
from scipy.spatial.distance import cdist

polygon = [[10,5],[5,6],[1,2],[2,5]]
points = [[4,5],[1,2]]

# return distances between all the items of the two arrays
distances = cdist(polygon,points) 

print(distances)

[[6.         9.48683298]
 [1.41421356 5.65685425]
 [4.24264069 0.        ]
 [2.         3.16227766]]

我们现在要做的就是获取数组中0的索引。可以看到，我们的交点在第3行第2列，也就是多边形的第3项或者点的第2项。


for i,dist in enumerate(distances.flatten()):
    if dist==0:
        intersect_index = np.unravel_index(i,shape=distances.shape)
        intersect_point = polygon[intersect_index[0]]
        print(intersect_point)

[1,2]

这应该会为您提供所需的矢量化形式。

如何对比较来自两个不同数据帧的形状对象的函数进行矢量化？

How to vectorize a function that compares shapely objects from two different dataframes?

python

vectorization

pandas

shapely

geopandas