在 Stan/RStan 中实现指数一般线性模型

Question

我有以下型号：

我想学习如何实现这个模型。

研究与尝试

我以下面的泊松 GLM 为例：

```{stan output.var="PoissonGLMQR"}
  data{
    int<lower=1> n;     // number of observations
    int<lower=1> p;     // number of predictors
    matrix[n, p] x;     // design matrix
    int<lower=0> y[n];  // responses
  }
  
  transformed data{
    matrix [n, p] Q_ast;
    matrix [p, p] R_ast;
    matrix [p, p] R_ast_inverse;
    Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
    R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
    R_ast_inverse = inverse(R_ast);
  }
  
  parameters{
    vector[p] theta;  // regression coefficients for predictors
  }
  
  transformed parameters{
    vector[p] beta;
    vector[n] mu;
    mu = exp(Q_ast*theta);
    beta = R_ast_inverse * theta;
  }
  
  model{
     y  ~ poisson(mu);
  }
```

我知道这个例子使用了 QR 重新参数化（参见 Stan 参考手册的第 9.2 节）。然而，由于我是 Stan 的新手，我觉得这很吓人，而且我不确定如何以同样的方式实现指数 GLM。对于指数情况，甚至需要使用 QR 重新参数化吗？

无论如何，我对指数 GLM 的幼稚尝试是对泊松 GLM 代码的以下改编：

```{stan output.var="ExponentialGLM"}
  data{
    int<lower=1> n;     // number of observations
    int<lower=1> p;     // number of predictors
    matrix[n, p] x;     // design matrix
    real<lower=0> y[n];  // responses
  }
  
  transformed data{
    matrix [n, p] Q_ast;
    matrix [p, p] R_ast;
    matrix [p, p] R_ast_inverse;
    Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
    R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
    R_ast_inverse = inverse(R_ast);
  }
  
  parameters{
    vector[p] theta;  // regression coefficients for predictors
  }
  
  transformed parameters{
    vector[p] beta;
    vector[n] mu;
    mu = exp(Q_ast*theta);
    beta = (R_ast_inverse * theta);
  }
  
  model{
     y  ~ exponential(mu);
  }
```

但是，我完全不确定这是否是在 Stan 中编写此类模型的正确方法。我所做的只是尝试使泊松 GLM 示例适应上述指数 GLM。

我正在寻求指数 GLM 的演示和对我上述误解的澄清。

示例数据

    conc  time
 1   6.1   0.8
 2   4.2   3.5
 3   0.5  12.4
 4   8.8   1.1
 5   1.5   8.9
 6   9.2   2.4
 7   8.5   0.1
 8   8.7   0.4
 9   6.7   3.5
10   6.5   8.3
11   6.3   2.6
12   6.7   1.5
13   0.2  16.6
14   8.7   0.1
15   7.5   1.3

Answer 1

这样的怎么样？

我们从读取示例数据开始。

df <- read.table(text =
    "    conc  time
 1   6.1   0.8
 2   4.2   3.5
 3   0.5  12.4
 4   8.8   1.1
 5   1.5   8.9
 6   9.2   2.4
 7   8.5   0.1
 8   8.7   0.4
 9   6.7   3.5
10   6.5   8.3
11   6.3   2.6
12   6.7   1.5
13   0.2  16.6
14   8.7   0.1
15   7.5   1.3", header = T);

我们将 Stan 模型定义为具有 log-link.

的简单 Gamma（指数）模型

model <- "
data {
    int N;                                       // Number of observations
    int K;                                       // Number of parameters
    real y[N];                                   // Response
    matrix[N,K] X;                               // Model matrix
}

parameters {
    vector[K] beta;                              // Model parameters
}

transformed parameters {
    vector[N] lambda;
    lambda = exp(-X * beta);                     // log-link
}

model {
    // Priors
    beta[1] ~ cauchy(0, 10);
    for (i in 2:K)
        beta[i] ~ cauchy(0, 2.5);

    y ~ exponential(lambda);
}
"

这里我假设 beta 模型参数的（标准）信息量少的 Cauchy 先验。

我们现在使模型适合数据。

library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);

X <- model.matrix(time ~ conc, data = df);
model.stan <- stan(
    model_code = model,
    data = list(
        N = nrow(df),
        K = ncol(X),
        y = df$time,
        X = X),
    iter = 4000);

为了比较点估计，我们还使用 glm.
将 Gamma GLM 拟合到数据中
```
model.glm <- glm(time ~ conc, data = df, family = Gamma(link = "log"));
```

Stan 模型参数估计为

summary(model.stan, "beta")$summary;
#              mean     se_mean         sd       2.5%        25%        50%
#beta[1]  2.9371457 0.016460000 0.62017934  1.8671652  2.5000356  2.8871936
#beta[2] -0.3099799 0.002420744 0.09252423 -0.5045937 -0.3708111 -0.3046333
#               75%      97.5%    n_eff     Rhat
#beta[1]  3.3193334  4.3078478 1419.629 1.002440
#beta[2] -0.2461567 -0.1412095 1460.876 1.002236

GLM 模型参数估计为

summary(model.glm)$coef;
#              Estimate Std. Error   t value     Pr(>|t|)
#(Intercept)  2.8211487 0.54432115  5.182875 0.0001762685
#conc        -0.3013355 0.08137986 -3.702827 0.0026556952

我们看到均值的 Stan 点估计与 Gamma-GLM 参数估计之间的一致性。

我们绘制了 Stan 和 glm 模型的参数估计值。

library(tidyverse);
as.data.frame(summary(model.stan, "beta")$summary) %>%
    rownames_to_column("Parameter") %>%
    ggplot(aes(x = `50%`, y = Parameter)) +
    geom_point(size = 3) +
    geom_segment(aes(x = `2.5%`, xend = `97.5%`, yend = Parameter), size = 1) +
    geom_segment(aes(x = `25%`, xend = `75%`, yend = Parameter), size = 2) +
    labs(x = "Median (plus/minus 95% and 50% CIs)") +
    geom_point(
        data = data.frame(summary(model.glm)$coef, row.names = c("beta[1]", "beta[2]")) %>%
            rownames_to_column("Parameter"),
        aes(x = Estimate, y = Parameter),
        colour = "red", size = 4, shape = 1)

glm 估计值以红色显示。

更新（使用 QR 重新参数化来拟合 Stan 模型）

具有 QR 重新参数化的 Stan 模型。

model.QR <- "
data {
    int N;                                       // Number of observations
    int K;                                       // Number of parameters
    real y[N];                                   // Response
    matrix[N,K] X;                               // Model matrix
}

transformed data {
    matrix[N, K] Q;
    matrix[K, K] R;
    matrix[K, K] R_inverse;
    Q = qr_Q(X)[, 1:K];
    R = qr_R(X)[1:K, ];
    R_inverse = inverse(R);
}

parameters {
    vector[K] theta;
}

transformed parameters {
    vector[N] lambda;
    lambda = exp(-Q * theta);                     // log-link
}

model {
    y ~ exponential(lambda);
}

generated quantities {
    vector[K] beta;                              // Model parameters
    beta = R_inverse * theta;
}
"

在 QR 分解中，我们有 lambda = exp(- X * beta) = exp(- Q * R * beta) = exp(- Q * theta)，其中 theta = R * beta 因此 beta = R^-1 * theta。

请注意，我们没有在 theta 上指定任何先验；在 Stan 中，这默认为平坦（即统一）先验。

将 Stan 模型拟合到数据。

library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);

X <- model.matrix(time ~ conc, data = df);
model.stan.QR <- stan(
    model_code = model.QR,
    data = list(
        N = nrow(df),
        K = ncol(X),
        y = df$time,
        X = X),
    iter = 4000);

参数估计
```
summary(model.stan.QR, "beta")$summary;
#              mean     se_mean         sd       2.5%        25%        50%
#beta[1]  2.9637547 0.009129250 0.64383609  1.8396681  2.5174800  2.9194682
#beta[2] -0.3134252 0.001321584 0.09477156 -0.5126905 -0.3740475 -0.3093937
#               75%      97.5%    n_eff      Rhat
#beta[1]  3.3498585  4.3593912 4973.710 0.9998545
#beta[2] -0.2478029 -0.1395686 5142.408 1.0003236
```
您可以看到两个 Stan 模型（有和没有 QR 重新参数化）之间的参数估计非常吻合。

如果你问我 QR 重新参数化是否会产生 (big/any) 差异，我会说 "probably not in this case"； Andrew Gelman 和其他人经常强调，即使使用信息量非常微弱的先验也有助于收敛，应该优先于平坦（均匀）的先验；我总是会尝试在所有参数上使用弱信息先验，并从没有 QR 重新参数化的模型开始；如果收敛性差，我会在下一步尝试优化我的模型。

在 Stan/RStan 中实现指数一般线性模型

Implementing exponential General Linear Model in Stan/RStan

statistics

regression

r

stan

rstan

更新（使用 QR 重新参数化来拟合 Stan 模型）