R中的SAS单变量
SAS univariate in R
我是统计和 R 的新手,希望在 SAS for R 中找到过程单变量的对数正态分布的等价物。代码是这样的,
Proc univariate data = dat;
histogram kilo / lognormal (theta=est zeta=est sigma=est noprint)
Midpoints 1 to 55477 by 20
Outhistogram=this;
Run;
此处输入数据为 dat,为概率分布选择的变量为 kilo。值 55477 是 kilo 变量的最大值。
theta、zeta 和 sigma 的选项表示最大估计似然
我在 运行 代码后得到以下内容。 A table,以下列1到55477乘以20(2774条记录)(来自sas网站的列解释):
EXPPCT - estimated percent of population in histogram interval determined from optional fitted distribution (here it is lognormal)
OBSPCT - percent of variable values in histogram interval
VAR - variable name (here it is kilo)
MIDPT - midpoint of histogram interval
我正在使用 exppct、midpt 值进行进一步分析。
你可以尝试这样的事情。
## Sample data
set.seed(0)
dat <- rlnorm(1000, 7)
## MLE estimates
library(fitdistrplus)
pars <- coef(fitdist(dat, "lnorm"))
## table variables
breaks <- seq(1, max(dat)+100, 100) # histogram breaks
mids <- diff(breaks)/2 + head(breaks, -1) # midpoints
probs <- diff(plnorm(breaks, pars[[1]], pars[[2]])) # expected probs for each bin
obs <- table(cut(dat, breaks)) / length(dat) # observed
res <- data.frame(MIDPT=mids,
OBSPCT=as.numeric(obs)*100,
EXPPCT=probs*100,
INTERVAL=names(obs))
head(res)
# MIDPT OBSPCT EXPPCT INTERVAL
# 1 51 0.5 0.8775098 (1,101]
# 2 151 3.5 3.7212573 (101,201]
# 3 251 5.9 5.4240329 (201,301]
# 4 351 6.4 6.0203732 (301,401]
# 5 451 6.8 6.0371393 (401,501]
# 6 551 5.5 5.7785383 (501,601]
## Plot
hist(dat, breaks=breaks, freq=F, col="steelblue")
points((ps <- seq(1, max(dat)+100, len=1000)),
dlnorm(ps, pars[[1]], pars[[2]]), type="l", col="salmon", lwd=3)
legend("topright", "Expected", col="salmon", lwd=3)
我是统计和 R 的新手,希望在 SAS for R 中找到过程单变量的对数正态分布的等价物。代码是这样的,
Proc univariate data = dat;
histogram kilo / lognormal (theta=est zeta=est sigma=est noprint)
Midpoints 1 to 55477 by 20
Outhistogram=this;
Run;
此处输入数据为 dat,为概率分布选择的变量为 kilo。值 55477 是 kilo 变量的最大值。
theta、zeta 和 sigma 的选项表示最大估计似然
我在 运行 代码后得到以下内容。 A table,以下列1到55477乘以20(2774条记录)(来自sas网站的列解释):
EXPPCT - estimated percent of population in histogram interval determined from optional fitted distribution (here it is lognormal)
OBSPCT - percent of variable values in histogram interval
VAR - variable name (here it is kilo)
MIDPT - midpoint of histogram interval
我正在使用 exppct、midpt 值进行进一步分析。
你可以尝试这样的事情。
## Sample data
set.seed(0)
dat <- rlnorm(1000, 7)
## MLE estimates
library(fitdistrplus)
pars <- coef(fitdist(dat, "lnorm"))
## table variables
breaks <- seq(1, max(dat)+100, 100) # histogram breaks
mids <- diff(breaks)/2 + head(breaks, -1) # midpoints
probs <- diff(plnorm(breaks, pars[[1]], pars[[2]])) # expected probs for each bin
obs <- table(cut(dat, breaks)) / length(dat) # observed
res <- data.frame(MIDPT=mids,
OBSPCT=as.numeric(obs)*100,
EXPPCT=probs*100,
INTERVAL=names(obs))
head(res)
# MIDPT OBSPCT EXPPCT INTERVAL
# 1 51 0.5 0.8775098 (1,101]
# 2 151 3.5 3.7212573 (101,201]
# 3 251 5.9 5.4240329 (201,301]
# 4 351 6.4 6.0203732 (301,401]
# 5 451 6.8 6.0371393 (401,501]
# 6 551 5.5 5.7785383 (501,601]
## Plot
hist(dat, breaks=breaks, freq=F, col="steelblue")
points((ps <- seq(1, max(dat)+100, len=1000)),
dlnorm(ps, pars[[1]], pars[[2]]), type="l", col="salmon", lwd=3)
legend("topright", "Expected", col="salmon", lwd=3)