梅尔频率函数:矩阵维度的错误
Mel-frequency function: error with matrix dimensions
我正在尝试按照此 link:http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/ 制作原型音频识别系统。它非常简单,所以几乎没有什么可担心的。但我的问题是梅尔频率功能。这是网站上提供的代码:
function m = melfb(p, n, fs)
% MELFB Determine matrix for a mel-spaced filterbank
%
% Inputs: p number of filters in filterbank
% n length of fft
% fs sample rate in Hz
%
% Outputs: x a (sparse) matrix containing the filterbank amplitudes
% size(x) = [p, 1+floor(n/2)]
%
% Usage: For example, to compute the mel-scale spectrum of a
% colum-vector signal s, with length n and sample rate fs:
%
% f = fft(s);
% m = melfb(p, n, fs);
% n2 = 1 + floor(n/2);
% z = m * abs(f(1:n2)).^2;
%
% z would contain p samples of the desired mel-scale spectrum
%
% To plot filterbanks e.g.:
%
% plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)'),
% title('Mel-spaced filterbank'), xlabel('Frequency (Hz)');
f0 = 700 / fs;
fn2 = floor(n/2);
lr = log(1 + 0.5/f0) / (p+1);
% convert to fft bin numbers with 0 for DC term
bl = n * (f0 * (exp([0 1 p p+1] * lr) - 1));
b1 = floor(bl(1)) + 1;
b2 = ceil(bl(2));
b3 = floor(bl(3));
b4 = min(fn2, ceil(bl(4))) - 1;
pf = log(1 + (b1:b4)/n/f0) / lr;
fp = floor(pf);
pm = pf - fp;
r = [fp(b2:b4) 1+fp(1:b3)];
c = [b2:b4 1:b3] + 1;
v = 2 * [1-pm(b2:b4) pm(1:b3)];
m = sparse(r, c, v, p, 1+fn2);
但是它给了我一个错误:
Error using * Inner matrix dimensions must agree.
Error in MFFC (line 17) z = m * abs(f(1:n2)).^2;
当我在第 17 行之前添加这两行时:
size(m)
size(abs(f(1:n2)).^2)
它给了我:
ans =
20 65
ans =
1 65
那么我应该转置第二个矩阵吗?或者我应该将其解释为逐行乘法并修改代码?
编辑:这里是主要函数(我只是运行 MFCC()):
function result = MFFC()
[y Fs] = audioread('s1.wav');
% sound(y,Fs)
Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
spectrum = FFT_After_Windowing(Windowed);
%imagesc(mag2db(abs(spectrum)))
p = 20;
S = size(spectrum);
n = S(2);
f = spectrum;
m = melfb(p, n, Fs);
n2 = 1 + floor(n/2);
size(m)
size(abs(f(1:n2)).^2)
z = m * abs(f(1:n2)).^2;
result = z;
辅助函数如下:
function f = Frame_Blocking(y,N)
% Parameters: M = 100, N = 256
% Default : M = 100; N = 256;
M = fix(N/3);
Frames = [];
first = 1; last = N;
len = length(y);
while last <= len
Frames = [Frames; y(first:last)'];
first = first + M;
last = last + M;
end;
if last < len
first = first + M;
Frames = [Frames; y(first : len)];
end
f = Frames;
function f = Windowing(Frames)
S = size(Frames);
N = S(2);
M = S(1);
Windowed = zeros(M,N);
nn = 1:N;
wn = 0.54 - 0.46*cos(2*pi/(N-1)*(nn-1));
for ii = 1:M
Windowed(ii,:) = Frames(ii,:).*wn;
end;
f = Windowed;
function f = FFT_After_Windowing(Windowed)
spectrum = fft(Windowed);
f = spectrum;
转置 s
或转置结果 f
(这只是一个约定问题)。
您正在使用的 melfb
函数没有任何问题,只是在您尝试 运行 的示例中信号的维度有问题(在注释行 14-17 中)。
% f = fft(s);
% m = melfb(p, n, fs);
% n2 = 1 + floor(n/2);
% z = m * abs(f(1:n2)).^2;
该示例假定您使用的是 "colum-vector signal s"。从你的傅里叶变换的大小 f
(通过 fft
完成,它尊重输入信号的维度)你的输入信号 s
是一个 row-vector 信号。
给你错误的部分是实际的过滤操作,需要将 p x n2
矩阵与 n2 x 1
column-vector 相乘(即,每个过滤器的响应逐点乘以输入信号的傅立叶)。由于您的输入 s
是 1 x n
,因此您的 f
将是 1 x n
并且 z
的向量乘法的最终矩阵将给出错误。
多亏了gevang的回答,我才得以找出我的错误。这是我修改代码的方式:
function result = MFFC()
[y Fs] = audioread('s2.wav');
% sound(y,Fs)
Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
%spectrum = FFT_After_Windowing(Windowed');
%imagesc(mag2db(abs(spectrum)))
p = 20;
%S = size(spectrum);
%n = S(2);
%f = spectrum;
S1 = size(Windowed);
n = S1(2);
n2 = 1 + floor(n/2);
%z = zeros(S1(1),n2);
z = zeros(20,S1(1));
for ii=1: S1(1)
s = (FFT_After_Windowing(Windowed(ii,:)'));
f = fft(s);
m = melfb(p,n,Fs);
% n2 = 1 + floor(n/2);
z(:,ii) = m * abs(f(1:n2)).^2;
end;
%f = FFT_After_Windowing(Windowed');
%S = size(f);
%n = S(2);
%size(f)
%m = melfb(p, n, Fs);
%n2 = 1 + floor(n/2);
%size(m)
%size(abs(f(1:n2)).^2)
%z = m * abs(f(1:n2)).^2;
result = z;
如您所见,我天真地假设该函数处理 row-wise 矩阵,但实际上它处理列向量(可能还有 column-wise 矩阵)。所以我遍历输入矩阵的每一列,然后组合结果。
但我不认为这是高效的矢量化代码。此外,我仍然无法弄清楚如何对输入矩阵进行 column-wise 操作(窗口化 - 在窗口化步骤之后),而不是使用循环。
我正在尝试按照此 link:http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/ 制作原型音频识别系统。它非常简单,所以几乎没有什么可担心的。但我的问题是梅尔频率功能。这是网站上提供的代码:
function m = melfb(p, n, fs)
% MELFB Determine matrix for a mel-spaced filterbank
%
% Inputs: p number of filters in filterbank
% n length of fft
% fs sample rate in Hz
%
% Outputs: x a (sparse) matrix containing the filterbank amplitudes
% size(x) = [p, 1+floor(n/2)]
%
% Usage: For example, to compute the mel-scale spectrum of a
% colum-vector signal s, with length n and sample rate fs:
%
% f = fft(s);
% m = melfb(p, n, fs);
% n2 = 1 + floor(n/2);
% z = m * abs(f(1:n2)).^2;
%
% z would contain p samples of the desired mel-scale spectrum
%
% To plot filterbanks e.g.:
%
% plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)'),
% title('Mel-spaced filterbank'), xlabel('Frequency (Hz)');
f0 = 700 / fs;
fn2 = floor(n/2);
lr = log(1 + 0.5/f0) / (p+1);
% convert to fft bin numbers with 0 for DC term
bl = n * (f0 * (exp([0 1 p p+1] * lr) - 1));
b1 = floor(bl(1)) + 1;
b2 = ceil(bl(2));
b3 = floor(bl(3));
b4 = min(fn2, ceil(bl(4))) - 1;
pf = log(1 + (b1:b4)/n/f0) / lr;
fp = floor(pf);
pm = pf - fp;
r = [fp(b2:b4) 1+fp(1:b3)];
c = [b2:b4 1:b3] + 1;
v = 2 * [1-pm(b2:b4) pm(1:b3)];
m = sparse(r, c, v, p, 1+fn2);
但是它给了我一个错误:
Error using * Inner matrix dimensions must agree.
Error in MFFC (line 17) z = m * abs(f(1:n2)).^2;
当我在第 17 行之前添加这两行时:
size(m)
size(abs(f(1:n2)).^2)
它给了我:
ans =
20 65
ans =
1 65
那么我应该转置第二个矩阵吗?或者我应该将其解释为逐行乘法并修改代码?
编辑:这里是主要函数(我只是运行 MFCC()):
function result = MFFC()
[y Fs] = audioread('s1.wav');
% sound(y,Fs)
Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
spectrum = FFT_After_Windowing(Windowed);
%imagesc(mag2db(abs(spectrum)))
p = 20;
S = size(spectrum);
n = S(2);
f = spectrum;
m = melfb(p, n, Fs);
n2 = 1 + floor(n/2);
size(m)
size(abs(f(1:n2)).^2)
z = m * abs(f(1:n2)).^2;
result = z;
辅助函数如下:
function f = Frame_Blocking(y,N)
% Parameters: M = 100, N = 256
% Default : M = 100; N = 256;
M = fix(N/3);
Frames = [];
first = 1; last = N;
len = length(y);
while last <= len
Frames = [Frames; y(first:last)'];
first = first + M;
last = last + M;
end;
if last < len
first = first + M;
Frames = [Frames; y(first : len)];
end
f = Frames;
function f = Windowing(Frames)
S = size(Frames);
N = S(2);
M = S(1);
Windowed = zeros(M,N);
nn = 1:N;
wn = 0.54 - 0.46*cos(2*pi/(N-1)*(nn-1));
for ii = 1:M
Windowed(ii,:) = Frames(ii,:).*wn;
end;
f = Windowed;
function f = FFT_After_Windowing(Windowed)
spectrum = fft(Windowed);
f = spectrum;
转置 s
或转置结果 f
(这只是一个约定问题)。
您正在使用的 melfb
函数没有任何问题,只是在您尝试 运行 的示例中信号的维度有问题(在注释行 14-17 中)。
% f = fft(s);
% m = melfb(p, n, fs);
% n2 = 1 + floor(n/2);
% z = m * abs(f(1:n2)).^2;
该示例假定您使用的是 "colum-vector signal s"。从你的傅里叶变换的大小 f
(通过 fft
完成,它尊重输入信号的维度)你的输入信号 s
是一个 row-vector 信号。
给你错误的部分是实际的过滤操作,需要将 p x n2
矩阵与 n2 x 1
column-vector 相乘(即,每个过滤器的响应逐点乘以输入信号的傅立叶)。由于您的输入 s
是 1 x n
,因此您的 f
将是 1 x n
并且 z
的向量乘法的最终矩阵将给出错误。
多亏了gevang的回答,我才得以找出我的错误。这是我修改代码的方式:
function result = MFFC()
[y Fs] = audioread('s2.wav');
% sound(y,Fs)
Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
%spectrum = FFT_After_Windowing(Windowed');
%imagesc(mag2db(abs(spectrum)))
p = 20;
%S = size(spectrum);
%n = S(2);
%f = spectrum;
S1 = size(Windowed);
n = S1(2);
n2 = 1 + floor(n/2);
%z = zeros(S1(1),n2);
z = zeros(20,S1(1));
for ii=1: S1(1)
s = (FFT_After_Windowing(Windowed(ii,:)'));
f = fft(s);
m = melfb(p,n,Fs);
% n2 = 1 + floor(n/2);
z(:,ii) = m * abs(f(1:n2)).^2;
end;
%f = FFT_After_Windowing(Windowed');
%S = size(f);
%n = S(2);
%size(f)
%m = melfb(p, n, Fs);
%n2 = 1 + floor(n/2);
%size(m)
%size(abs(f(1:n2)).^2)
%z = m * abs(f(1:n2)).^2;
result = z;
如您所见,我天真地假设该函数处理 row-wise 矩阵,但实际上它处理列向量(可能还有 column-wise 矩阵)。所以我遍历输入矩阵的每一列,然后组合结果。 但我不认为这是高效的矢量化代码。此外,我仍然无法弄清楚如何对输入矩阵进行 column-wise 操作(窗口化 - 在窗口化步骤之后),而不是使用循环。