如何在 Matlab 中拟合多峰对数正态分布?
How to fit a multimodal log-normal distribution in Matlab?
我需要拟合表示粒度测量值的多峰分布。例如,这些测量值可能如下所示:
现在我想拟合这些曲线。在 的帮助下,我能够为单峰分布函数获得相当不错的结果:
fun = @(p,x)(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2)));
通过像这样缩放结果参数:
[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);
xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;
opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
但是,如果我绘制拟合结果,很明显单峰分布不足以拟合测量结果:
在关于 multimodal distribution 的维基百科文章中,我似乎可以 混合 第二个分布,如下所示:
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));
但是我不知道我需要如何在缩放中整合额外的参数
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
因为我并没有真正理解这个缩放步骤中发生了什么。
如何使用多峰分布来拟合我的测量值?
编辑
使用的数据如下:
x_data = [4.87000000000000e-09 5.62000000000000e-09 6.49000000000000e-09 7.50000000000000e-09 8.66000000000000e-09 ...
1.00000000000000e-08 1.15500000000000e-08 1.33400000000000e-08 1.54000000000000e-08 1.77800000000000e-08 ...
2.05400000000000e-08 2.37100000000000e-08 2.73800000000000e-08 3.16200000000000e-08 3.65200000000000e-08 ...
4.21700000000000e-08 4.87000000000000e-08 5.62300000000000e-08 6.49400000000000e-08 7.49900000000000e-08 ...
8.66000000000000e-08 1.00000000000000e-07 1.15480000000000e-07 1.33350000000000e-07 1.53990000000000e-07 ...
1.77830000000000e-07 2.05350000000000e-07 2.37140000000000e-07 2.73840000000000e-07 3.16230000000000e-07 ...
3.65170000000000e-07 4.21700000000000e-07 4.86970000000000e-07 5.62340000000000e-07 6.49380000000000e-07 ...
7.49890000000000e-07 8.65960000000000e-07 1.00000000000000e-06];
DPF_in_mean = [188318640795.745 360952462222.222 750859638450.704 2226776878843.93 4845941940346.82 7979258430057.80 ...
11010887350289.0 13462058712138.7 15090350247398.8 15991756383815.0 16680978441618.5 17862081914450.9 ...
20071390890173.4 23460963364161.9 27630428508670.5 31777265780346.8 35520451433526.0 38587652184971.1 ...
40516972485549.1 41326812092485.6 41127130682080.9 40038712485549.1 37976259664739.9 34725415132948.0 ...
30177578265896.0 24546703179190.8 18400851109826.6 12500471611560.7 7540309609248.56 3912091102658.96 ...
1632974141040.46 458500289086.705 126012891030.303 0 0 0 7276263267.44526 11203995842.0392];
DPF_out_mean = [444898373533.333 1032357396444.44 1675044380444.44 2316141430222.22 2852971589555.56 3151959865111.11 ...
3134892475777.78 2828026308000.00 2325761940666.67 1745907627777.78 1192912799111.11 742253282222.222 ...
430349362888.889 255820144555.556 188235813444.444 181970493622.222 204829338533.333 233009821977.778 ...
243007623333.333 230736732777.778 202426609488.889 169758857200.000 140604138622.222 116482776222.222 ...
95076737155.5556 74172071777.7778 53672033733.3333 35251323911.1111 20813708255.5556 11102006362.8889 ...
5497173092.96089 2625918349.76536 1471042995.80373 1012939492.96541 751738952.194595 589422111.731818 ...
479373451.936508 378359645.767442];
这是一种可能会有用的可能性。由于数据中的大峰掩盖了较小的峰,减去较大的峰数据会留下小峰数据以供分析。如果您知道大峰的形式,则可以将数据与其匹配,然后在每个数据集中仅保留两个双峰峰中的一个用于分析。找到第二个峰的形式后,您可以重新开始拟合两个峰的总和,使用之前的分析拟合值作为最终分析的初始参数值。
我对两个数据集都进行了方程式搜索,为每组中的主峰找到合适的峰方程,这是我的结果。没有进行任何数据转换或预处理,我使用发布的原始数据。
对于DPF_in_mean我的主峰为:
def Peak_LogNormalAShifted_model(x_in): # from zunzun.com using DPF_in_mean
# coefficients
a = 4.2518873952455000E+13
b = -1.6088042345890798E+01
c = 6.0084502637701809E-01
d = -2.3091703726006359E-08
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in-d)-b) / c, 2.0))
对于 DPF_out_mean 我的主峰为:
def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
# coefficients
a = 3.1863877879345913E+12
b = -1.8334716040160675E+01
c = 4.4913908739937525E-01
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))
我需要拟合表示粒度测量值的多峰分布。例如,这些测量值可能如下所示:
现在我想拟合这些曲线。在
fun = @(p,x)(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2)));
通过像这样缩放结果参数:
[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);
xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;
opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
但是,如果我绘制拟合结果,很明显单峰分布不足以拟合测量结果:
在关于 multimodal distribution 的维基百科文章中,我似乎可以 混合 第二个分布,如下所示:
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));
但是我不知道我需要如何在缩放中整合额外的参数
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
因为我并没有真正理解这个缩放步骤中发生了什么。
如何使用多峰分布来拟合我的测量值?
编辑
使用的数据如下:
x_data = [4.87000000000000e-09 5.62000000000000e-09 6.49000000000000e-09 7.50000000000000e-09 8.66000000000000e-09 ...
1.00000000000000e-08 1.15500000000000e-08 1.33400000000000e-08 1.54000000000000e-08 1.77800000000000e-08 ...
2.05400000000000e-08 2.37100000000000e-08 2.73800000000000e-08 3.16200000000000e-08 3.65200000000000e-08 ...
4.21700000000000e-08 4.87000000000000e-08 5.62300000000000e-08 6.49400000000000e-08 7.49900000000000e-08 ...
8.66000000000000e-08 1.00000000000000e-07 1.15480000000000e-07 1.33350000000000e-07 1.53990000000000e-07 ...
1.77830000000000e-07 2.05350000000000e-07 2.37140000000000e-07 2.73840000000000e-07 3.16230000000000e-07 ...
3.65170000000000e-07 4.21700000000000e-07 4.86970000000000e-07 5.62340000000000e-07 6.49380000000000e-07 ...
7.49890000000000e-07 8.65960000000000e-07 1.00000000000000e-06];
DPF_in_mean = [188318640795.745 360952462222.222 750859638450.704 2226776878843.93 4845941940346.82 7979258430057.80 ...
11010887350289.0 13462058712138.7 15090350247398.8 15991756383815.0 16680978441618.5 17862081914450.9 ...
20071390890173.4 23460963364161.9 27630428508670.5 31777265780346.8 35520451433526.0 38587652184971.1 ...
40516972485549.1 41326812092485.6 41127130682080.9 40038712485549.1 37976259664739.9 34725415132948.0 ...
30177578265896.0 24546703179190.8 18400851109826.6 12500471611560.7 7540309609248.56 3912091102658.96 ...
1632974141040.46 458500289086.705 126012891030.303 0 0 0 7276263267.44526 11203995842.0392];
DPF_out_mean = [444898373533.333 1032357396444.44 1675044380444.44 2316141430222.22 2852971589555.56 3151959865111.11 ...
3134892475777.78 2828026308000.00 2325761940666.67 1745907627777.78 1192912799111.11 742253282222.222 ...
430349362888.889 255820144555.556 188235813444.444 181970493622.222 204829338533.333 233009821977.778 ...
243007623333.333 230736732777.778 202426609488.889 169758857200.000 140604138622.222 116482776222.222 ...
95076737155.5556 74172071777.7778 53672033733.3333 35251323911.1111 20813708255.5556 11102006362.8889 ...
5497173092.96089 2625918349.76536 1471042995.80373 1012939492.96541 751738952.194595 589422111.731818 ...
479373451.936508 378359645.767442];
这是一种可能会有用的可能性。由于数据中的大峰掩盖了较小的峰,减去较大的峰数据会留下小峰数据以供分析。如果您知道大峰的形式,则可以将数据与其匹配,然后在每个数据集中仅保留两个双峰峰中的一个用于分析。找到第二个峰的形式后,您可以重新开始拟合两个峰的总和,使用之前的分析拟合值作为最终分析的初始参数值。
我对两个数据集都进行了方程式搜索,为每组中的主峰找到合适的峰方程,这是我的结果。没有进行任何数据转换或预处理,我使用发布的原始数据。
对于DPF_in_mean我的主峰为:
def Peak_LogNormalAShifted_model(x_in): # from zunzun.com using DPF_in_mean
# coefficients
a = 4.2518873952455000E+13
b = -1.6088042345890798E+01
c = 6.0084502637701809E-01
d = -2.3091703726006359E-08
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in-d)-b) / c, 2.0))
对于 DPF_out_mean 我的主峰为:
def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
# coefficients
a = 3.1863877879345913E+12
b = -1.8334716040160675E+01
c = 4.4913908739937525E-01
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))