具有稳健线性模型、分位数回归和机器学习方法的逆回归程序
Inverse regression procedures with robust linear models, quantile regression, and machine learning methods
上下文
通常在剂量反应模型中,我们会根据反应变量回归一定范围的剂量,但我们真正感兴趣的是确定引发特定反应所需的剂量。通常这是通过逆回归技术完成的(即 after-fitting / reparameterisation)。编辑:澄清一下——当您需要估计杀死 50% 或 99.99% 所需的剂量时,通常会这样做。为了得出这些估计值,人们采用逆回归技术 - 上面 link 更仔细地讨论了这一点(见第 9 页)。
问题
如何使用稳健线性模型、分位数回归或机器学习模型(即神经网络或支持向量机)等方法执行这些逆向回归过程?编辑:为了澄清,我想要一个编程解决方案,以解决当我安装的模型是上述模型之一时如何估计引起 99.99 响应所需的剂量。为此,我在下面安装了示例模型。
我的数据是这样的:
df <- structure(list(Response = c(100, 91.1242603550296, 86.9822485207101,
100, 0, 0, 90.5325443786982, 95.8579881656805, 88.7573964497041,
96.4497041420118, 82.2485207100592, 99.4082840236686, 99.4082840236686,
98.8165680473373, 91.7159763313609, 59.1715976331361, 44.9704142011834,
0, 100, 95.2662721893491, 100, 82.8402366863905, 7.69230769230769,
81.6568047337278, 62.7218934911243, 97.6331360946746, 73.9644970414201,
8.87573964497041, 0, 98.8165680473373, 78.1065088757396, 98.2248520710059,
52.6627218934911, 96.4497041420118, 52.0710059171598, 0, 62.043795620438,
84.6715328467153, 97.8102189781022, 4.37956204379562, 89.051094890511,
99.2700729927007, 99.2700729927007, 97.0802919708029, 81.7518248175183,
80.2919708029197, 90.5109489051095, 99.2700729927007, 96.3503649635037,
0, 0, 94.8905109489051, 79.5620437956204, 67.8832116788321, 73.7226277372263,
100, 97.0802919708029, 93.4306569343066, 86.8613138686131, 33.5766423357664,
32.1167883211679, 46.7153284671533, 98.5401459854015, 95.6204379562044,
86.1313868613139, 14.5985401459854, 92.7007299270073, 86.1313868613139,
0, 77.3722627737226, 89.051094890511, 80.2919708029197, 98.1818181818182,
96.3636363636364, 30.9090909090909, 0, 60.9090909090909, 100,
0, 83.6363636363636, 88.1818181818182, 97.2727272727273, 0, 0,
99.0909090909091, 100, 100, 91.8181818181818, 88.1818181818182,
46.3636363636364, 50.9090909090909, 99.0909090909091, 97.2727272727273,
100, 0, 92.7272727272727, 60.9090909090909, 90.9090909090909,
57.2727272727273, 76.3636363636364, 94.5454545454545, 50, 98.1818181818182,
16.3636363636364, 87.2727272727273, 92.7272727272727, 87.2727272727273,
88.1818181818182, 10.7438016528926, 91.7355371900827, 98.3471074380165,
60.3305785123967, 95.8677685950413, 0, 63.6363636363636, 71.900826446281,
0, 74.3801652892562, 76.8595041322314, 0, 61.9834710743802, 0,
0, 0, 84.297520661157, 47.1074380165289, 69.4214876033058, 97.5206611570248,
100, 61.1570247933884, 90.0826446280992, 78.5123966942149, 10.7438016528926,
100, 98.3471074380165, 100, 98.3471074380165, 93.3884297520661,
90.9090909090909, 57.8512396694215, 57.8512396694215, 92.5619834710744,
77.6859504132231, 69.4214876033058), Covariate = c(20, 14, 14,
20, 0, 0, 14, 14, 14, 16, 10, 20, 20, 20, 16, 10, 10, 0, 16,
16, 16, 10, 0, 12, 10, 12, 12, 0, 0, 20, 12, 16, 10, 12, 12,
0, 14, 14, 16, 0, 14, 20, 16, 20, 14, 12, 12, 20, 20, 0, 0, 14,
12, 10, 10, 20, 16, 16, 14, 10, 10, 10, 20, 16, 10, 0, 12, 12,
0, 12, 16, 14, 16, 14, 0, 0, 12, 20, 0, 12, 14, 14, 0, 0, 20,
20, 20, 14, 14, 10, 10, 20, 16, 16, 0, 12, 10, 10, 10, 16, 16,
12, 20, 10, 12, 12, 16, 14, 0, 16, 20, 12, 14, 10, 10, 0, 0,
12, 12, 10, 10, 0, 0, 0, 14, 12, 12, 20, 20, 14, 14, 14, 12,
20, 20, 20, 16, 16, 14, 10, 10, 16, 16, 16)), row.names = 433:576, class = "data.frame")
我的公式通常是这样的:
响应 ~ 协变量 + I(协变量^2)
以下是我安装的模型示例:
#Robust linear model
MASS::rlm(Response ~ Covariate + I(Covariate^2), data = df)
#Quantile regression
quantreg::rq(Response ~ Covariate + I(Covariate^2), data = df, tau = c(0.5, 0.95)) # In this case I want to predict the specified quantiles for the dose required to elicit a given response, although I realised this code doesn't do that...
#Machine learning algorithms were trained with caret
TRControl <- trainControl(method = "cv")
#Neural Network
caret::train(Response ~ Covariate, data = df, method = "neuralnet", trControl = TRControl)
#Support Vector Machine
caret::train(Response ~ Covariate, data = df, method = "polySVM", trControl = TRControl)
根据我上面的评论,您的数据与典型的剂量反应测量结果并不十分相似
library(ggplot2)
ggplot(df, aes(Covariate, log10(Response))) +
geom_point()
这里我假设Covariate
就是dose/concentration。
每个 Covariate
的不同测量是否与不同的 experiments/groups 相关?您是否计划对不同的组拟合多个剂量反应曲线以进行比较?
一种可能的分析策略
这里有些东西可能会给你一些想法。我在这里使用 drc
,因为它允许我将 "sensible" 剂量反应曲线拟合到您的数据。合理的剂量反应模型具有 dose → 0
和 dose → ∞
.
的水平渐近线
在这个特定示例中,我们为您的数据拟合了一个四参数 Weibull 函数。
library(drc)
model <- drm(Response ~ Covariate, data = df, fct = W2.4())
让我们绘制原始数据和模型预测(包括置信区间)
library(tidyverse)
df.pred <- data.frame(Covariate = 1.1 * seq(min(df$Covariate), max(df$Covariate), length.out = 20)) %>%
bind_cols(as.data.frame(predict(model, data.frame(Covariate = Covariate), interval = "confidence"))) %>%
rename(Response = Prediction)
ggplot(df, aes(Covariate, Response)) +
geom_point() +
geom_line(data = df.pred, aes(Covariate, Response), color = "blue") +
geom_ribbon(data = df.pred, aes(x = Covariate, ymin = Lower, ymax = Upper), fill = "blue", alpha = 0.2)
我们现在可以使用 uniroot
来确定特定的 LDx 值,这些值定义为将最大反应降低 x
/ 100 所需的剂量。
getLDx <- function(model, x = 0.5) {
maxResponse <- max(predict(model, data.frame(x = c(0, Inf))))
uniroot(
function(Covariate) predict(model, newdata = data.frame(Covariate = Covariate)) - x * maxResponse,
interval = range(Covariate))$root
}
这基本上是模型的反转,所以也许这就是您 link 的论文作者在您的原始 post 中所指的 "inverse regression techniques"。
让我们计算 LD50 值(即减少 50% 反应所需的剂量)
getLDx(model, x = 0.5)
#[1] 9.465188
通过检查绘图,您可以看到该值确实对应于响应为最大响应值的 50% 的剂量。
上下文
通常在剂量反应模型中,我们会根据反应变量回归一定范围的剂量,但我们真正感兴趣的是确定引发特定反应所需的剂量。通常这是通过逆回归技术完成的(即 after-fitting / reparameterisation)。编辑:澄清一下——当您需要估计杀死 50% 或 99.99% 所需的剂量时,通常会这样做。为了得出这些估计值,人们采用逆回归技术 - 上面 link 更仔细地讨论了这一点(见第 9 页)。
问题
如何使用稳健线性模型、分位数回归或机器学习模型(即神经网络或支持向量机)等方法执行这些逆向回归过程?编辑:为了澄清,我想要一个编程解决方案,以解决当我安装的模型是上述模型之一时如何估计引起 99.99 响应所需的剂量。为此,我在下面安装了示例模型。
我的数据是这样的:
df <- structure(list(Response = c(100, 91.1242603550296, 86.9822485207101,
100, 0, 0, 90.5325443786982, 95.8579881656805, 88.7573964497041,
96.4497041420118, 82.2485207100592, 99.4082840236686, 99.4082840236686,
98.8165680473373, 91.7159763313609, 59.1715976331361, 44.9704142011834,
0, 100, 95.2662721893491, 100, 82.8402366863905, 7.69230769230769,
81.6568047337278, 62.7218934911243, 97.6331360946746, 73.9644970414201,
8.87573964497041, 0, 98.8165680473373, 78.1065088757396, 98.2248520710059,
52.6627218934911, 96.4497041420118, 52.0710059171598, 0, 62.043795620438,
84.6715328467153, 97.8102189781022, 4.37956204379562, 89.051094890511,
99.2700729927007, 99.2700729927007, 97.0802919708029, 81.7518248175183,
80.2919708029197, 90.5109489051095, 99.2700729927007, 96.3503649635037,
0, 0, 94.8905109489051, 79.5620437956204, 67.8832116788321, 73.7226277372263,
100, 97.0802919708029, 93.4306569343066, 86.8613138686131, 33.5766423357664,
32.1167883211679, 46.7153284671533, 98.5401459854015, 95.6204379562044,
86.1313868613139, 14.5985401459854, 92.7007299270073, 86.1313868613139,
0, 77.3722627737226, 89.051094890511, 80.2919708029197, 98.1818181818182,
96.3636363636364, 30.9090909090909, 0, 60.9090909090909, 100,
0, 83.6363636363636, 88.1818181818182, 97.2727272727273, 0, 0,
99.0909090909091, 100, 100, 91.8181818181818, 88.1818181818182,
46.3636363636364, 50.9090909090909, 99.0909090909091, 97.2727272727273,
100, 0, 92.7272727272727, 60.9090909090909, 90.9090909090909,
57.2727272727273, 76.3636363636364, 94.5454545454545, 50, 98.1818181818182,
16.3636363636364, 87.2727272727273, 92.7272727272727, 87.2727272727273,
88.1818181818182, 10.7438016528926, 91.7355371900827, 98.3471074380165,
60.3305785123967, 95.8677685950413, 0, 63.6363636363636, 71.900826446281,
0, 74.3801652892562, 76.8595041322314, 0, 61.9834710743802, 0,
0, 0, 84.297520661157, 47.1074380165289, 69.4214876033058, 97.5206611570248,
100, 61.1570247933884, 90.0826446280992, 78.5123966942149, 10.7438016528926,
100, 98.3471074380165, 100, 98.3471074380165, 93.3884297520661,
90.9090909090909, 57.8512396694215, 57.8512396694215, 92.5619834710744,
77.6859504132231, 69.4214876033058), Covariate = c(20, 14, 14,
20, 0, 0, 14, 14, 14, 16, 10, 20, 20, 20, 16, 10, 10, 0, 16,
16, 16, 10, 0, 12, 10, 12, 12, 0, 0, 20, 12, 16, 10, 12, 12,
0, 14, 14, 16, 0, 14, 20, 16, 20, 14, 12, 12, 20, 20, 0, 0, 14,
12, 10, 10, 20, 16, 16, 14, 10, 10, 10, 20, 16, 10, 0, 12, 12,
0, 12, 16, 14, 16, 14, 0, 0, 12, 20, 0, 12, 14, 14, 0, 0, 20,
20, 20, 14, 14, 10, 10, 20, 16, 16, 0, 12, 10, 10, 10, 16, 16,
12, 20, 10, 12, 12, 16, 14, 0, 16, 20, 12, 14, 10, 10, 0, 0,
12, 12, 10, 10, 0, 0, 0, 14, 12, 12, 20, 20, 14, 14, 14, 12,
20, 20, 20, 16, 16, 14, 10, 10, 16, 16, 16)), row.names = 433:576, class = "data.frame")
我的公式通常是这样的:
响应 ~ 协变量 + I(协变量^2)
以下是我安装的模型示例:
#Robust linear model
MASS::rlm(Response ~ Covariate + I(Covariate^2), data = df)
#Quantile regression
quantreg::rq(Response ~ Covariate + I(Covariate^2), data = df, tau = c(0.5, 0.95)) # In this case I want to predict the specified quantiles for the dose required to elicit a given response, although I realised this code doesn't do that...
#Machine learning algorithms were trained with caret
TRControl <- trainControl(method = "cv")
#Neural Network
caret::train(Response ~ Covariate, data = df, method = "neuralnet", trControl = TRControl)
#Support Vector Machine
caret::train(Response ~ Covariate, data = df, method = "polySVM", trControl = TRControl)
根据我上面的评论,您的数据与典型的剂量反应测量结果并不十分相似
library(ggplot2)
ggplot(df, aes(Covariate, log10(Response))) +
geom_point()
这里我假设Covariate
就是dose/concentration。
每个 Covariate
的不同测量是否与不同的 experiments/groups 相关?您是否计划对不同的组拟合多个剂量反应曲线以进行比较?
一种可能的分析策略
这里有些东西可能会给你一些想法。我在这里使用 drc
,因为它允许我将 "sensible" 剂量反应曲线拟合到您的数据。合理的剂量反应模型具有 dose → 0
和 dose → ∞
.
在这个特定示例中,我们为您的数据拟合了一个四参数 Weibull 函数。
library(drc) model <- drm(Response ~ Covariate, data = df, fct = W2.4())
让我们绘制原始数据和模型预测(包括置信区间)
library(tidyverse) df.pred <- data.frame(Covariate = 1.1 * seq(min(df$Covariate), max(df$Covariate), length.out = 20)) %>% bind_cols(as.data.frame(predict(model, data.frame(Covariate = Covariate), interval = "confidence"))) %>% rename(Response = Prediction) ggplot(df, aes(Covariate, Response)) + geom_point() + geom_line(data = df.pred, aes(Covariate, Response), color = "blue") + geom_ribbon(data = df.pred, aes(x = Covariate, ymin = Lower, ymax = Upper), fill = "blue", alpha = 0.2)
我们现在可以使用
uniroot
来确定特定的 LDx 值,这些值定义为将最大反应降低x
/ 100 所需的剂量。getLDx <- function(model, x = 0.5) { maxResponse <- max(predict(model, data.frame(x = c(0, Inf)))) uniroot( function(Covariate) predict(model, newdata = data.frame(Covariate = Covariate)) - x * maxResponse, interval = range(Covariate))$root }
这基本上是模型的反转,所以也许这就是您 link 的论文作者在您的原始 post 中所指的 "inverse regression techniques"。
让我们计算 LD50 值(即减少 50% 反应所需的剂量)
getLDx(model, x = 0.5) #[1] 9.465188
通过检查绘图,您可以看到该值确实对应于响应为最大响应值的 50% 的剂量。