在 Group == 条件下应用 Function

Question

我有一个 Df 如下：

         position_latitude  position_longitude geohash
0                53.398940           10.069293      u1
1                53.408875           10.052669      u1
2                48.856350            9.171759      u0
3                48.856068            9.170798      u0
4                48.856350            9.171759      u0

我想知道的是，使用基于 Geohash 的不同 Shapefile 接收距离该位置最近的节点。

所以我想做的是从文件中为 Geohash (ex u1) 中的所有组加载图形，然后在函数中使用此图形来获取最近的节点。

我可以在 for 循环中完成，但我认为有更有效的方法。

我想到了这样的事情：

df['nearestNode'] = geoSub.apply(lambda x: getDistanceToEdge(x.position_latitude,x. position_longitude,x. geohash), axis=1)

但是，我不知道如何为每个组只加载一次图表，因为从文件中获取它需要一些时间。

到目前为止我想到了什么：

groupHashed = geoSub.groupby('geohash')
geoSub['distance'] = np.nan

for name, group in groupHashed:
    G = osmnx.graph.graph_from_xml('geohash/'+name+'.osm', simplify=True, retain_all=False)
    geoSub['distance'] = geoSub.apply(lambda x: getDistanceToEdge(x.position_latitude,x.position_longitude, G) if x.geohash == name, axis=1)

似乎确实有效，但我觉得 if 条件会大大降低它的速度

更新：刚刚更新：

geoSub['distance'] = geoSub.apply(lambda x: getDistanceToEdge(x.position_latitude,x.position_longitude, G) if x.geohash == name, axis=1)

至：

geoSub['distance'] = geoSub[geoSub['geohash'] == name].apply(lambda x: getDistanceToEdge(x.position_latitude,x.position_longitude, G), axis=1)

现在快多了。还有更好的方法吗？

Answer 1

您可以使用transform

我正在对 G 和 getDistanceToEdge（如 x+y+geohash[-1]）进行打桩，所以展示一个工作示例

import pandas as pd
from io import StringIO 
data = StringIO("""
,position_latitude,position_longitude,geohash
0,53.398940,10.069293,u1
1,53.408875,10.052669,u1
2,48.856350,9.171759,u0
3,48.856068,9.170798,u0
4,48.856350,9.171759,u0
""" )
df = pd.read_csv(data, index_col=0).fillna('')

def getDistanceToEdge(x, y, G):
  return x+y+G

def fun(pos):  
  G = int(pos.values[0][-1][-1])
  return pos.apply(lambda x: getDistanceToEdge(x[0], x[1], G))

df['pos'] = list(zip(df['position_latitude'], df['position_longitude'], df['geohash']))
df['distance'] = df.groupby(['geohash'])['pos'].transform(fun)
df = df.drop(['pos'], axis=1)

print (df)

输出：

   position_latitude  position_longitude geohash   distance
0          53.398940           10.069293      u1  64.468233
1          53.408875           10.052669      u1  64.461544
2          48.856350            9.171759      u0  58.028109
3          48.856068            9.170798      u0  58.026866
4          48.856350            9.171759      u0  58.028109

如您所见，您可以在函数 fun 中使用 pos.values[0][-1] 获取组名。这是因为我们关心将 pos 列构建为 (lat, log, geohash) 的元组，并且 groupby 之后的组中的每个 geohash 都是相同的。因此，对于一个组，我们可以通过获取任何行的元组 (pos) 的最后一个值来获取 geohash。 pos.values[0][-1] 给出第一行元组的最后一个值。

在 Group == 条件下应用 Function

Apply Function under Group == condition

python

apply

dataframe

pandas