FFT 卷积如何以及为什么比直接卷积更快?

How and why is FFT convolution faster than direct convolution?

我了解到在频域中计算时卷积速度更快,因为它是 "just" 矩阵乘法(二维),而在时域中它是很多小矩阵乘法。

所以我做了这个代码我们可以看到FFT卷积比"normal"卷积更复杂。 很明显我的假设有问题。

怎么了?

from sympy import exp, log, symbols, init_printing, lambdify
init_printing(use_latex='matplotlib')

import numpy as np
import matplotlib.pyplot as plt

def _complex_mult(n):
  """Complexity of a MatMul of a 2 matrices of size (n, n)"""
  # see https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
  return n**2.5

def _complex_fft(n):
  """Complexity of fft and ifft"""
  # see https://en.wikipedia.org/wiki/Fast_Fourier_transform
  return n*log(n)

def fft_mult_fft(n, m):
  """Complexity of a convolution in the freq space.
  fft -> mult between M and kernel -> ifft
  """
  return _complex_fft(n) * 2 + _complex_mult(n)

def conv(n, m):
  """Complexity of a convolution in the time space.
  for every n of M, we execute a MatMul of 2 (m, m) matrices
  """
  return n*_complex_mult(m)


n = symbols('n') # size of M = (n, n)
m = symbols('m') # size of kernel = (m, m)

M = np.linspace(1, 1e3+1, 1e1)
kernel_size = np.linspace(2, 7, 7-2+1)**2

fft = fft_mult_fft(n, m)
discrete = conv(n, m)

f1 = lambdify(n, fft, 'numpy')
f2 = lambdify([n, m], discrete, 'numpy')

fig, ax = plt.subplots(1, len(kernel_size), figsize=(30, 10))

f1_computed = f1(M) # independant wrt m, do not compute it at each time

for i, size in enumerate(kernel_size):
  ax[i].plot(M, f1_computed, c='red', label='freq domain (fft)')
  ax[i].plot(M, f2(M, size), c='blue', label='time domain (normal)')
  ax[i].legend(loc='upper left')
  ax[i].set_title("kernel size = {}".format(size))
  ax[i].set_xlabel("Matrix size")
  ax[i].set_ylabel("Complexity")

这是输出:(点击放大)

正如@user545424 指出的那样,问题是我正在计算 n*complexity(MatMul(kernel)) 而不是 n²*complexity(MatMul(kernel)) 来进行 "normal" 卷积。

我终于明白了:(其中 n 是输入的大小,m 是内核的大小)


这是最终代码和新图表。

from sympy import exp, log, symbols, init_printing, lambdify
init_printing(use_latex='matplotlib')

import numpy as np
import matplotlib.pyplot as plt

def _complex_mult(n):
  """Complexity of a MatMul of a 2 matrices of size (n, n)"""
  # see https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
  return n**2.5

def _complex_fft(n):
  """Complexity of fft and ifft"""
  # see 
  return 4*(n**2)*log(n)

def fft_mult_fft(n, m):
  """Complexity of a convolution in the freq space.
  fft -> mult between M and kernel -> ifft
  """
  return _complex_fft(n) * 2 + _complex_mult(n)

def conv(n, m):
  """Complexity of a convolution in the time space.
  for every n*n cell of M, we execute a MatMul of 2 (m, m) matrices
  """
  return n*n*_complex_mult(m)


n = symbols('n') # size of M = (n, n)
m = symbols('m') # size of kernel = (m, m)

M = np.linspace(1, 1e3+1, 1e1)
kernel_size = np.linspace(2, 7, 7-2+1)**2

fft_symb = fft_mult_fft(n, m)
discrete_symb = conv(n, m)

fft_func = lambdify(n, fft_symb, 'numpy')
dicrete_func = lambdify([n, m], discrete_symb, 'numpy')


fig, ax = plt.subplots(1, len(kernel_size), figsize=(30, 10))
fig.patch.set_facecolor('grey')

for i, size in enumerate(kernel_size):
  ax[i].plot(M, fft_func(M), c='red', label='freq domain (fft)')
  ax[i].plot(M, dicrete_func(M, size), c='blue', label='time domain (normal)')
  ax[i].legend(loc='upper left')
  ax[i].set_title("kernel size = {}".format(size))
  ax[i].set_xlabel("Matrix size")
  ax[i].set_ylabel("Complexity")

您正在经历两个 well-known 事实:

  • 对于小内核大小,空间方法更快,

  • 对于大内核大小,频率方法可以更快。

您的内核和图像相对较小,无法观察到 FFT 的好处。