Python: 对音乐文件进行 FFT

Question

我正在尝试对我按如下方式创建的歌曲（wav 格式的音频文件，大约 3 分钟长）执行 FFT，以防相关。

ffmpeg -i "" -vn -ab 128k -ar 44100 -y -ac 1 "${1%.webm}.wav"

其中 </code> 是 webm 文件的名称。</p> <p>这是应该显示给定文件的 FFT 的代码：</p> <pre><code>import numpy as np import matplotlib.pyplot as plt # presume file already converted to wav. file = os.path.join(temp_folder, file_name) rate, aud_data = scipy.io.wavfile.read(file) # wav file is mono. channel_1 = aud_data[:] fourier = np.fft.fft(channel_1) plt.figure(1) plt.plot(fourier) plt.xlabel('n') plt.ylabel('amplitude') plt.show()

问题是，它需要永远。花了很长时间，我什至无法显示输出，因为我有足够的时间研究和编写这个 post 但它仍然没有完成。

我认为文件太长了，因为

print (aud_data.shape)

输出(9218368,)，但这看起来像是一个现实世界的问题，所以我希望有一种方法可以以某种方式获得音频文件的 FFT。

我做错了什么？谢谢。

编辑

一个更好的问题表述是：FFT 在音乐处理中有什么好处吗？例如2件的相似度。

正如评论中指出的那样，我的简单方法太慢了。

谢谢。

Answer 1

要显着加快分析的 fft 部分，您可以将数据补零至 2 的幂：

import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate, aud_data = 44000, np.random.random((9218368,))

len_data = len(aud_data)

channel_1 = np.zeros(2**(int(np.ceil(np.log2(len_data)))))
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)

下面是使用上述方法绘制几个正弦波傅里叶变换的实部的例子：

import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate = 44000
ii = np.arange(0, 9218368)
t = ii / rate
aud_data = np.zeros(len(t))
for w in [1000, 5000, 10000, 15000]:
    aud_data += np.cos(2 * np.pi * w * t)

# From here down, everything else can be the same
len_data = len(aud_data)

channel_1 = np.zeros(2**(int(np.ceil(np.log2(len_data)))))
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)
w = np.linspace(0, 44000, len(fourier))

# First half is the real component, second half is imaginary
fourier_to_plot = fourier[0:len(fourier)//2]
w = w[0:len(fourier)//2]

plt.figure(1)

plt.plot(w, fourier_to_plot)
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()

Python: 对音乐文件进行 FFT

Python: performing FFT on music file

python

signal-processing

numpy

fft

scipy