对较小尺寸数组的广播操作
Broadcast operation on array of smaller size
我需要提高对不同 shapes/sizes 的数组执行操作的性能。数组 pos
的形状为 (2, 500)
,xa, xb, ya, yb
数组的形状为 (30,)
.
下面MVCE中所示的操作将pos
的两个维度分别与xa, xb
和ya, yb
相结合。
这可以应用 numpy
广播来完成吗?
import numpy as np
# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])
# Apply broadcasting on the operation performed by this 'for' block?
vals = []
for p in zip(*pos):
vals.append(np.sum(np.exp(-0.5 * (
((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
您可以使用 np.tile
并修改for循环如下
xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))
vals_ = np.exp(-0.5 * (
((pos[0].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)
解释:
- 在每次迭代中,您将使用 pos[0][i] 和 pos[1][i] 并对 xa、xb、ya、yb 进行运算。
- Tile 复制了这 250000 次中的所有 4 次,即 pos 的形状 [1] 或迭代次数。
- 我们还需要重塑 pos[0] 和 pos[1] 并将它们设为 2D 以使操作有效。
时间详情:
在我的机器上,矢量化代码大约需要 0.20 秒,而非矢量化代码大约需要 3 秒。下面是重现的代码:
import numpy as np
import time
# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])
# Apply broadcasting on the operation performed by this 'for' block?
start = time.time()
for i in range(10):
vals = []
for p in zip(*pos):
vals.append(np.sum(np.exp(-0.5 * (
((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
stop = time.time()
print( (stop-start)/10)
start = time.time()
for i in range(10):
xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))
vals_ = np.exp(-0.5 * (
((pos[0,:].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)
stop = time.time()
print( (stop-start)/10)
print(np.allclose(vals_, np.array(vals))==True)
我需要提高对不同 shapes/sizes 的数组执行操作的性能。数组 pos
的形状为 (2, 500)
,xa, xb, ya, yb
数组的形状为 (30,)
.
下面MVCE中所示的操作将pos
的两个维度分别与xa, xb
和ya, yb
相结合。
这可以应用 numpy
广播来完成吗?
import numpy as np
# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])
# Apply broadcasting on the operation performed by this 'for' block?
vals = []
for p in zip(*pos):
vals.append(np.sum(np.exp(-0.5 * (
((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
您可以使用 np.tile 并修改for循环如下
xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))
vals_ = np.exp(-0.5 * (
((pos[0].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)
解释:
- 在每次迭代中,您将使用 pos[0][i] 和 pos[1][i] 并对 xa、xb、ya、yb 进行运算。
- Tile 复制了这 250000 次中的所有 4 次,即 pos 的形状 [1] 或迭代次数。
- 我们还需要重塑 pos[0] 和 pos[1] 并将它们设为 2D 以使操作有效。
时间详情: 在我的机器上,矢量化代码大约需要 0.20 秒,而非矢量化代码大约需要 3 秒。下面是重现的代码:
import numpy as np
import time
# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])
# Apply broadcasting on the operation performed by this 'for' block?
start = time.time()
for i in range(10):
vals = []
for p in zip(*pos):
vals.append(np.sum(np.exp(-0.5 * (
((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
stop = time.time()
print( (stop-start)/10)
start = time.time()
for i in range(10):
xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))
vals_ = np.exp(-0.5 * (
((pos[0,:].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)
stop = time.time()
print( (stop-start)/10)
print(np.allclose(vals_, np.array(vals))==True)