R中的SAS单变量

SAS univariate in R

我是统计和 R 的新手,希望在 SAS for R 中找到过程单变量的对数正态分布的等价物。代码是这样的,

Proc univariate data = dat;
histogram kilo / lognormal (theta=est zeta=est sigma=est noprint) 

      Midpoints 1 to 55477 by 20
Outhistogram=this;
Run;

此处输入数据为 dat,为概率分布选择的变量为 kilo。值 55477 是 kilo 变量的最大值。

theta、zeta 和 sigma 的选项表示最大估计似然

我在 运行 代码后得到以下内容。 A table,以下列1到55477乘以20(2774条记录)(来自sas网站的列解释):

EXPPCT - estimated percent of population in histogram interval determined from optional fitted distribution (here it is lognormal)

OBSPCT - percent of variable values in histogram interval

VAR - variable name (here it is kilo)

MIDPT - midpoint of histogram interval

我正在使用 exppct、midpt 值进行进一步分析。

你可以尝试这样的事情。

## Sample data
set.seed(0)
dat <- rlnorm(1000, 7)

## MLE estimates
library(fitdistrplus)
pars <- coef(fitdist(dat, "lnorm"))

## table variables
breaks <- seq(1, max(dat)+100, 100)                  # histogram breaks
mids <- diff(breaks)/2 + head(breaks, -1)            # midpoints
probs <- diff(plnorm(breaks, pars[[1]], pars[[2]]))  # expected probs for each bin
obs <- table(cut(dat, breaks)) / length(dat)         # observed 

res <- data.frame(MIDPT=mids,
                  OBSPCT=as.numeric(obs)*100,
                  EXPPCT=probs*100,
                  INTERVAL=names(obs))
head(res)
#   MIDPT OBSPCT    EXPPCT  INTERVAL
# 1    51    0.5 0.8775098   (1,101]
# 2   151    3.5 3.7212573 (101,201]
# 3   251    5.9 5.4240329 (201,301]
# 4   351    6.4 6.0203732 (301,401]
# 5   451    6.8 6.0371393 (401,501]
# 6   551    5.5 5.7785383 (501,601]

## Plot
hist(dat, breaks=breaks, freq=F, col="steelblue")
points((ps <- seq(1, max(dat)+100, len=1000)),
       dlnorm(ps, pars[[1]], pars[[2]]), type="l", col="salmon", lwd=3)
legend("topright", "Expected", col="salmon", lwd=3)