估计泊松分布
Estimating the Poisson distribution
我有一个图,我计算了度数分布和度数如下:
library(igraph) # for these two functions
dd <- degree_distribution(graph)
d <- degree(graph)
据此,我估计到幂律,看看我的分布是否遵循"Law of Power":
degree = 1:max(d)
probability = dd[-1]
nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]
reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)
power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))
据此,我使用 ggplot2
绘制了点数和幂律。
结果如下图:
df <- data.frame(x = degree, y = probability)
print(
ggplot(df, aes(x,y,colour="Distribuição"))+
geom_point(shape = 4) +
stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+
labs(title = "Grafo", subtitle = "Distribuição dos Graus",
x="K", y="P(k)", colour="Legenda")+
scale_color_brewer(palette="Dark2")
)
如您所见,我的分布不遵循幂律!我想估计泊松分布并绘制在同一张图上。
尽管我不确定我的分布不服从(或服从)泊松分布,但我想将其与幂律结合起来。我不知道如何从数据中估计这个分布(泊松),并计算平均度。
谁能帮帮我?
The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.
来自?dpois
:
The Poisson distribution has density
p(x) = λ^x exp(-λ)/x!
for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.
所以我将生成一些带有秘密 lambda 的虚拟数据:
mysecret <- ####
x <- data.frame(xes = rpois(50, mysecret))
> x$xes
[1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3
> mean(x$xes)
[1] 1.66
> var(x$xes)
[1] 1.371837
我的秘密 lambda 有两个很好的猜测是 1.66 和 1.37。让我们试试看:
library(ggplot2)
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density.., color = "Raw data"),
fill = "white", binwidth = 1, center = 0, size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"),
fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"),
fun.args = list(lambda = 1.37), geom = "line", size = 1.5)
他们都很好。您不能真正使用内置的 stat_function
或 geom_density
来生成它们,因为泊松分布仅为整数定义。直方图和汇总函数效果很好,因为它们只是在数据点本身进行估计,而不是插值。
如果你想要更多细节,你可以使用MASS
包:
MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))
lambda
1.6601563
(0.1822258)
所以让我们尝试从中构建:
library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
dens.m = dpois(xes, 1.66),
dens.u = dpois(xes, 1.66+0.18),
dens.l = dpois(xes, 1.66-0.18))
> df
# A tibble: 6 x 4
xes dens.m dens.u dens.l
<dbl> <dbl> <dbl> <dbl>
1 0 0.19013898 0.15881743 0.22763769
2 1 0.31563071 0.29222406 0.33690378
3 2 0.26197349 0.26884614 0.24930880
4 3 0.14495866 0.16489230 0.12299234
5 4 0.06015785 0.07585046 0.04550717
6 5 0.01997240 0.02791297 0.01347012
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density..), color = "black",
fill = "white", binwidth = 1, center = 0, size = 1.5) +
geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)
基于这两种方法和视觉解释,您应该可以轻松地说出 λ = 1.66+/-0.18。
作为参考,我的秘密初始值为 1.5。
我有一个图,我计算了度数分布和度数如下:
library(igraph) # for these two functions
dd <- degree_distribution(graph)
d <- degree(graph)
据此,我估计到幂律,看看我的分布是否遵循"Law of Power":
degree = 1:max(d)
probability = dd[-1]
nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]
reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)
power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))
据此,我使用 ggplot2
绘制了点数和幂律。
结果如下图:
df <- data.frame(x = degree, y = probability)
print(
ggplot(df, aes(x,y,colour="Distribuição"))+
geom_point(shape = 4) +
stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+
labs(title = "Grafo", subtitle = "Distribuição dos Graus",
x="K", y="P(k)", colour="Legenda")+
scale_color_brewer(palette="Dark2")
)
如您所见,我的分布不遵循幂律!我想估计泊松分布并绘制在同一张图上。 尽管我不确定我的分布不服从(或服从)泊松分布,但我想将其与幂律结合起来。我不知道如何从数据中估计这个分布(泊松),并计算平均度。
谁能帮帮我?
The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.
来自?dpois
:
The Poisson distribution has density
p(x) = λ^x exp(-λ)/x!
for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.
所以我将生成一些带有秘密 lambda 的虚拟数据:
mysecret <- ####
x <- data.frame(xes = rpois(50, mysecret))
> x$xes [1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3 > mean(x$xes) [1] 1.66 > var(x$xes) [1] 1.371837
我的秘密 lambda 有两个很好的猜测是 1.66 和 1.37。让我们试试看:
library(ggplot2)
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density.., color = "Raw data"),
fill = "white", binwidth = 1, center = 0, size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"),
fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"),
fun.args = list(lambda = 1.37), geom = "line", size = 1.5)
他们都很好。您不能真正使用内置的 stat_function
或 geom_density
来生成它们,因为泊松分布仅为整数定义。直方图和汇总函数效果很好,因为它们只是在数据点本身进行估计,而不是插值。
如果你想要更多细节,你可以使用MASS
包:
MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))
lambda 1.6601563 (0.1822258)
所以让我们尝试从中构建:
library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
dens.m = dpois(xes, 1.66),
dens.u = dpois(xes, 1.66+0.18),
dens.l = dpois(xes, 1.66-0.18))
> df # A tibble: 6 x 4 xes dens.m dens.u dens.l <dbl> <dbl> <dbl> <dbl> 1 0 0.19013898 0.15881743 0.22763769 2 1 0.31563071 0.29222406 0.33690378 3 2 0.26197349 0.26884614 0.24930880 4 3 0.14495866 0.16489230 0.12299234 5 4 0.06015785 0.07585046 0.04550717 6 5 0.01997240 0.02791297 0.01347012
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density..), color = "black",
fill = "white", binwidth = 1, center = 0, size = 1.5) +
geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)
基于这两种方法和视觉解释,您应该可以轻松地说出 λ = 1.66+/-0.18。
作为参考,我的秘密初始值为 1.5。