3D 数组中非零元素的最后两个条目的平均值
Mean of the last two entries of non-zero elements in a 3D array
我有一个(n x i x j)- 3D numpy 数组:a_3d_array
(2 x 5 x 3)
array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]]).
对于 n 中的每一列 j,我想提取最后 2 个非零元素并计算平均值,然后将结果放入一个 (n x j) 数组中。我目前所做的是使用 for 循环
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.zeros([2,3])
for n in range(2):
for j in range(3):
temp = a_3d_array[n,:,j]
nonzero_array = temp[np.nonzero(temp)]
aveCol[n, j] = np.mean(nonzero_array[-2:])
获得想要的结果
print(aveCol)
[[1.5 2.5 3.5] [2.5 3.5 4.5]]
效果很好。但我想知道是否有更好的 Pythonic 方式来做同样的事情?
我发现与我的问题最相似的是here。但是我不太理解在稍微不同的上下文中解释的答案。
您可以使用 filter
方法从数组中过滤掉 0
。
这是一个列表理解方法:
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.array([[np.mean(list(filter(None, a_3d_array[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
print(aveCol)
输出:
[[1.5 2.5 3.5]
[2.5 3.5 4.5]]
来自@gboffi 的注释:为了提高效率,请使用
aveCol = np.array([[sum([i for i in a_3d_array[n,:,j] if i][-2:])/2 for j in range(3)] for n in range(2)])
而不是
aveCol = np.array([[np.array([i for i in a_3d_array[n,:,j] if i][-2:]) for j in range(3)] for n in range(2)])
TL;DR 据我所知, 是最快的
每个m
是一个n×i二维数组,接下来我们取其转置的r
ow,即上的“列”执行计算——在这个“列”上我们丢弃所有的零,我们对最后两个非零元素求和并取平均值
In [17]: np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
Out[17]:
array([[1.5, 2.5, 3.5],
[2.5, 3.5, 4.5]])
编辑1
它看起来比你的循环要快
In [19]: %%timeit
...: avg = np.zeros([2,3])
...: for n in range(2):
...: for j in range(3):
...: temp = a[n,:,j]
...: nz = temp[np.nonzero(temp)]
...: avg[n, j] = np.mean(nz[-2:])
95.1 µs ± 596 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [20]: %timeit np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
45.5 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑2
In [22]: %timeit np.array([[np.mean(list(filter(None, a[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
145 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑3
In [25]: %%timeit
...: i = np.indices(a.shape)
...: i[:, a == 0] = -1
...: i = np.sort(i, axis=2)
...: i = i[:, :, -2:, :]
...: a[tuple(i)].mean(axis=1)
64 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit4 突发新闻信息
安的回答中的罪魁祸首是 np.mean
!!
In [29]: %timeit np.array([[sum(list(filter(None, a[n,:,j]))[-2:])/2 for j in range(3)] for n in range(2)])
32.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
您可以获得数组的索引 a
,用负数标记零项,排序,限制,然后将结果用作索引:
i = np.indices(a.shape)
i[:, a == 0] = -1
i = np.sort(i, axis=2)
i = i[:, :, -2:, :]
a[tuple(i)].mean(axis=1)
# array([[1.5, 2.5, 3.5],
# [2.5, 3.5, 4.5]])
我有一个(n x i x j)- 3D numpy 数组:a_3d_array
(2 x 5 x 3)
array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]]).
对于 n 中的每一列 j,我想提取最后 2 个非零元素并计算平均值,然后将结果放入一个 (n x j) 数组中。我目前所做的是使用 for 循环
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.zeros([2,3])
for n in range(2):
for j in range(3):
temp = a_3d_array[n,:,j]
nonzero_array = temp[np.nonzero(temp)]
aveCol[n, j] = np.mean(nonzero_array[-2:])
获得想要的结果
print(aveCol)
[[1.5 2.5 3.5] [2.5 3.5 4.5]]
效果很好。但我想知道是否有更好的 Pythonic 方式来做同样的事情?
我发现与我的问题最相似的是here。但是我不太理解在稍微不同的上下文中解释的答案。
您可以使用 filter
方法从数组中过滤掉 0
。
这是一个列表理解方法:
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.array([[np.mean(list(filter(None, a_3d_array[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
print(aveCol)
输出:
[[1.5 2.5 3.5]
[2.5 3.5 4.5]]
来自@gboffi 的注释:为了提高效率,请使用
aveCol = np.array([[sum([i for i in a_3d_array[n,:,j] if i][-2:])/2 for j in range(3)] for n in range(2)])
而不是
aveCol = np.array([[np.array([i for i in a_3d_array[n,:,j] if i][-2:]) for j in range(3)] for n in range(2)])
TL;DR 据我所知,
每个m
是一个n×i二维数组,接下来我们取其转置的r
ow,即上的“列”执行计算——在这个“列”上我们丢弃所有的零,我们对最后两个非零元素求和并取平均值
In [17]: np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
Out[17]:
array([[1.5, 2.5, 3.5],
[2.5, 3.5, 4.5]])
编辑1
它看起来比你的循环要快
In [19]: %%timeit
...: avg = np.zeros([2,3])
...: for n in range(2):
...: for j in range(3):
...: temp = a[n,:,j]
...: nz = temp[np.nonzero(temp)]
...: avg[n, j] = np.mean(nz[-2:])
95.1 µs ± 596 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [20]: %timeit np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
45.5 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑2
In [22]: %timeit np.array([[np.mean(list(filter(None, a[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
145 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑3
In [25]: %%timeit
...: i = np.indices(a.shape)
...: i[:, a == 0] = -1
...: i = np.sort(i, axis=2)
...: i = i[:, :, -2:, :]
...: a[tuple(i)].mean(axis=1)
64 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit4 突发新闻信息
安的回答中的罪魁祸首是 np.mean
!!
In [29]: %timeit np.array([[sum(list(filter(None, a[n,:,j]))[-2:])/2 for j in range(3)] for n in range(2)])
32.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
您可以获得数组的索引 a
,用负数标记零项,排序,限制,然后将结果用作索引:
i = np.indices(a.shape)
i[:, a == 0] = -1
i = np.sort(i, axis=2)
i = i[:, :, -2:, :]
a[tuple(i)].mean(axis=1)
# array([[1.5, 2.5, 3.5],
# [2.5, 3.5, 4.5]])