对象之间的成对距离(Xarray)
Pairwise distance between objects (Xarray)
我有 3 cars
在 space
(x,y) 中以 10 time
步行驶。
对于每个时间步长,我想计算汽车之间的成对欧氏距离。
import numpy as np
from scipy.spatial.distance import pdist
import xarray as xr
data = np.random.rand(3,2,10)
times = pd.date_range('2000-01-01', periods=10)
space = ['x','y']
cars = ['a','b','c']
foo = xr.DataArray(data, coords=[cars,space,times], dims = ['cars','space','time'])
下面的 for 循环迭代工作正常,每个输入都是 3*2 数组,pdist
正在愉快地计算汽车之间所有成对距离的压缩距离矩阵
for label,group in foo.groupby('time'):
print(group.shape, type(group), pdist(group) )
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.45389929 0.96104589 0.51489773]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.87532985 0.49758256 0.4418555 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.44036486 0.17947479 0.39842543]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.52294711 0.26278261 0.78106623]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.30004324 0.62807379 0.40601505]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.48351623 0.38331324 0.30677522]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.83682031 0.38409803 0.455275 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.33614753 0.50814237 0.49033016]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.17365559 0.33567641 0.30382769]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.76981095 0.18099241 0.91187884]
但是这个简单的调用(应该执行与我理解的相同的操作)失败了。
foo.groupby('time').apply(pdist)
AttributeError: 'numpy.ndarray' object has no attribute 'dims'
return 形状似乎有问题?我这里需要 u_func
吗?
顺便说一句,所有这些调用都可以正常工作,并且 returns 符合预期的各种形状:
foo.groupby('time').apply(np.mean)
foo.groupby('time').apply(np.mean,axis=0)
foo.groupby('time').apply(np.mean,axis=1)
提前感谢任何指点...
pdist 改变了数组大小,因此 xarray 找不到它的坐标。
下面的怎么样?
In [12]: np.sqrt(((foo - foo.rename(cars='cars1'))**2).sum('space'))
Out[12]:
<xarray.DataArray (cars: 3, time: 10, cars1: 3)>
array([[[0. , 0.131342, 0.352521],
[0. , 0.329914, 0.859899],
[0. , 0.933117, 0.351842],
[0. , 0.802514, 0.426005],
[0. , 0.167081, 0.563704],
[0. , 0.9822 , 0.145496],
[0. , 0.894892, 0.457217],
[0. , 0.333222, 0.505805],
[0. , 0.377352, 0.604625],
[0. , 0.467771, 0.62544 ]],
[[0.131342, 0. , 0.243476],
[0.329914, 0. , 0.813076],
[0.933117, 0. , 0.847525],
[0.802514, 0. , 0.390665],
[0.167081, 0. , 0.562188],
[0.9822 , 0. , 0.957067],
[0.894892, 0. , 0.525863],
[0.333222, 0. , 0.835241],
[0.377352, 0. , 0.894856],
[0.467771, 0. , 0.594124]],
[[0.352521, 0.243476, 0. ],
[0.859899, 0.813076, 0. ],
[0.351842, 0.847525, 0. ],
[0.426005, 0.390665, 0. ],
[0.563704, 0.562188, 0. ],
[0.145496, 0.957067, 0. ],
[0.457217, 0.525863, 0. ],
[0.505805, 0.835241, 0. ],
[0.604625, 0.894856, 0. ],
[0.62544 , 0.594124, 0. ]]])
Coordinates:
* cars (cars) <U1 'a' 'b' 'c'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
* cars1 (cars1) <U1 'a' 'b' 'c'
如果你想得到与pdist
类似的输出,可以使用apply_ufunc
,
In [21]:xr.apply_ufunc(pdist, foo, input_core_dims=[['cars', 'space']],
...: output_core_dims=[['cars_pair']], vectorize=True)
...:
Out[21]:
<xarray.DataArray (time: 10, cars_pair: 3)>
array([[0.131342, 0.352521, 0.243476],
[0.329914, 0.859899, 0.813076],
[0.933117, 0.351842, 0.847525],
[0.802514, 0.426005, 0.390665],
[0.167081, 0.563704, 0.562188],
[0.9822 , 0.145496, 0.957067],
[0.894892, 0.457217, 0.525863],
[0.333222, 0.505805, 0.835241],
[0.377352, 0.604625, 0.894856],
[0.467771, 0.62544 , 0.594124]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
Dimensions without coordinates: cars_pair
我有 3 cars
在 space
(x,y) 中以 10 time
步行驶。
对于每个时间步长,我想计算汽车之间的成对欧氏距离。
import numpy as np
from scipy.spatial.distance import pdist
import xarray as xr
data = np.random.rand(3,2,10)
times = pd.date_range('2000-01-01', periods=10)
space = ['x','y']
cars = ['a','b','c']
foo = xr.DataArray(data, coords=[cars,space,times], dims = ['cars','space','time'])
下面的 for 循环迭代工作正常,每个输入都是 3*2 数组,pdist
正在愉快地计算汽车之间所有成对距离的压缩距离矩阵
for label,group in foo.groupby('time'):
print(group.shape, type(group), pdist(group) )
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.45389929 0.96104589 0.51489773]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.87532985 0.49758256 0.4418555 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.44036486 0.17947479 0.39842543]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.52294711 0.26278261 0.78106623]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.30004324 0.62807379 0.40601505]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.48351623 0.38331324 0.30677522]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.83682031 0.38409803 0.455275 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.33614753 0.50814237 0.49033016]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.17365559 0.33567641 0.30382769]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.76981095 0.18099241 0.91187884]
但是这个简单的调用(应该执行与我理解的相同的操作)失败了。
foo.groupby('time').apply(pdist)
AttributeError: 'numpy.ndarray' object has no attribute 'dims'
return 形状似乎有问题?我这里需要 u_func
吗?
顺便说一句,所有这些调用都可以正常工作,并且 returns 符合预期的各种形状:
foo.groupby('time').apply(np.mean)
foo.groupby('time').apply(np.mean,axis=0)
foo.groupby('time').apply(np.mean,axis=1)
提前感谢任何指点...
pdist 改变了数组大小,因此 xarray 找不到它的坐标。
下面的怎么样?
In [12]: np.sqrt(((foo - foo.rename(cars='cars1'))**2).sum('space'))
Out[12]:
<xarray.DataArray (cars: 3, time: 10, cars1: 3)>
array([[[0. , 0.131342, 0.352521],
[0. , 0.329914, 0.859899],
[0. , 0.933117, 0.351842],
[0. , 0.802514, 0.426005],
[0. , 0.167081, 0.563704],
[0. , 0.9822 , 0.145496],
[0. , 0.894892, 0.457217],
[0. , 0.333222, 0.505805],
[0. , 0.377352, 0.604625],
[0. , 0.467771, 0.62544 ]],
[[0.131342, 0. , 0.243476],
[0.329914, 0. , 0.813076],
[0.933117, 0. , 0.847525],
[0.802514, 0. , 0.390665],
[0.167081, 0. , 0.562188],
[0.9822 , 0. , 0.957067],
[0.894892, 0. , 0.525863],
[0.333222, 0. , 0.835241],
[0.377352, 0. , 0.894856],
[0.467771, 0. , 0.594124]],
[[0.352521, 0.243476, 0. ],
[0.859899, 0.813076, 0. ],
[0.351842, 0.847525, 0. ],
[0.426005, 0.390665, 0. ],
[0.563704, 0.562188, 0. ],
[0.145496, 0.957067, 0. ],
[0.457217, 0.525863, 0. ],
[0.505805, 0.835241, 0. ],
[0.604625, 0.894856, 0. ],
[0.62544 , 0.594124, 0. ]]])
Coordinates:
* cars (cars) <U1 'a' 'b' 'c'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
* cars1 (cars1) <U1 'a' 'b' 'c'
如果你想得到与pdist
类似的输出,可以使用apply_ufunc
,
In [21]:xr.apply_ufunc(pdist, foo, input_core_dims=[['cars', 'space']],
...: output_core_dims=[['cars_pair']], vectorize=True)
...:
Out[21]:
<xarray.DataArray (time: 10, cars_pair: 3)>
array([[0.131342, 0.352521, 0.243476],
[0.329914, 0.859899, 0.813076],
[0.933117, 0.351842, 0.847525],
[0.802514, 0.426005, 0.390665],
[0.167081, 0.563704, 0.562188],
[0.9822 , 0.145496, 0.957067],
[0.894892, 0.457217, 0.525863],
[0.333222, 0.505805, 0.835241],
[0.377352, 0.604625, 0.894856],
[0.467771, 0.62544 , 0.594124]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
Dimensions without coordinates: cars_pair