如何使用 R 求解多元多项式回归中的最高值(回归峰值)?
How to solve for highest value (regression peak) in multivariate polynomial regression using R?
我的 SQL Server 2017 数据库中有一个 table,其中包含以下部分数据:
我的目的是为 19 列中的每一列创建多元多项式回归,其中 LikingOrder 是我的因变量,给定 RespID 的 19 列值中的每一列都是自变量。
最终结果应该是每个 RespID 从 C1 到 C19 的每一列的最高回归值。最终结果应如下所示:
我已阅读有关 polym 的内容并尝试在以下脚本中使用它:
ALTER PROCEDURE [dbo].[spRegressionPeak]
@StudyID int
AS
BEGIN
Declare @sStudyID VARCHAR(50)
Set @sStudyID = CONVERT(VARCHAR(50),@StudyID)
--We use IsNull values to pass zeroes where an average wasn't calculated os
that the polynomial regression can be calculated.
DECLARE @inquery AS NVARCHAR(MAX) = '
Select
c.StudyID, c.RespID, c.LikingOrder, avg(C1) as C1, avg(C2) as C2, avg(C3) as
C3, avg(C4) as C4, avg(C5) as C5, avg(C6) as C6, avg(C7) as C7, avg(C8) as
C8, avg(C9) as C9, avg(C10) as C10,
avg(C11) as C11, avg(C12) as C12, avg(C13) as C13, avg(C14) as C14, avg(C15)
as C15, avg(C16) as C16, avg(C17) as C17, avg(isnull(C18,0)) as C18, avg(C19)
as C19
from ClosedStudyResponses c
where c.StudyID = @StudyID
group by StudyID, RespID, LikingOrder
order by RespID
--We are setting @inquery aka InputDataSet to be our initial dataset.
--R Services requires that a data.frame be passed to any calculations being
generated. As such, df is simply data framing the @inquery data.
--The res object holds the polynomial regression results by RespondentID and
LikingOrder for each of the averages in the @inquery resultset.
EXEC sp_execute_external_script @language = N'R'
, @script = N'
studymeans <- InputDataSet
df <- data.frame(studymeans)
res1 <- lm(df$LikingOrder ~ polym(df$c1, df$c2, df$c3, df$c4, df$c5, df$c6, df$c7, df$c8, df$c9,
df$c10, df$c11, df$c12, df$c13, df$c14, df$c15, df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE))
res <- data.frame(res1)
'
, @input_data_1 = @inquery
, @output_data_1_name = N'res'
, @params = N'@StudyID int'
,@StudyID = @StudyID
--- Edit this line to handle the output data frame.
WITH RESULT SETS ((RespID int, res varchar(max)));
END;
以上存储过程在提供有效 StudyID 时出现以下错误:
Error in model.frame.default(formula = df$LikingOrder ~ polym(df$c1, df$c2,
:
variable lengths differ (found for 'polym(df$c1, df$c2, df$c3, df$c4, df$c5,
df$c6, df$c7, df$c8, df$c9, df$c10, df$c11, df$c12, df$c13, df$c14, df$c15,
df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE)')
Calls: source ... lm -> eval -> eval -> <Anonymous> -> model.frame.default
In addition: There were 19 warnings (use warnings() to see them)
这是 polym 的正确用法吗?如果不是,我如何实现计算 19 个独立回归的目标?最后,如何以编程方式确定每个回归的最高值?
根据评论中的问题和讨论,assumptions
做出的是:
RespID
:是categorical parameter
,未用于模型拟合
StudyID
: 在示例数据中被忽略
LinkingOrder
:是因变量,即 response
(非分类)
C1 to C19
: independent variables
为数值
Objective
:确定linear fit
到变量C1
到C19
[=56=的回归系数]
Note
:未添加 polynomial fit
,因为有问题的最终请求 table 似乎没有列出迭代项。
Resource
:ISLR 中的第 3、5 章
创建样本数据框
StudyID <- rep(10001, 100)
RespID <- c(rep(117,25), rep(119,25), rep(120,25), rep(121,25))
LinkingOrder <- floor(runif(100, 1, 9))
df <- data.frame(StudyID, RespID, LinkingOrder)
# Create columns C1 to C19
for (i in c(1:19)){
vari <- paste("C", i, sep = "")
df[vari] <- floor(runif(100, 0, 9))
}
# Convert RespID to categorical variable
df$RespID <- as.factor(RespID)
拟合 lm() 并以 table 格式存储系数
注意:截距项包含在 table
# Fit lm() and store coefficients in a table
final_table <- data.frame()
for (respid in unique(df$RespID)){
data <- df[df['RespID']==respid, ]
data <- subset(data, select = -c(StudyID, RespID))
lm.fit <- lm(LinkingOrder ~ ., data=data)
# Save to table
final_table <- rbind(final_table, data.frame(t(unlist(lm.fit$coefficients))))
}
我的 SQL Server 2017 数据库中有一个 table,其中包含以下部分数据:
我的目的是为 19 列中的每一列创建多元多项式回归,其中 LikingOrder 是我的因变量,给定 RespID 的 19 列值中的每一列都是自变量。
最终结果应该是每个 RespID 从 C1 到 C19 的每一列的最高回归值。最终结果应如下所示:
我已阅读有关 polym 的内容并尝试在以下脚本中使用它:
ALTER PROCEDURE [dbo].[spRegressionPeak]
@StudyID int
AS
BEGIN
Declare @sStudyID VARCHAR(50)
Set @sStudyID = CONVERT(VARCHAR(50),@StudyID)
--We use IsNull values to pass zeroes where an average wasn't calculated os
that the polynomial regression can be calculated.
DECLARE @inquery AS NVARCHAR(MAX) = '
Select
c.StudyID, c.RespID, c.LikingOrder, avg(C1) as C1, avg(C2) as C2, avg(C3) as
C3, avg(C4) as C4, avg(C5) as C5, avg(C6) as C6, avg(C7) as C7, avg(C8) as
C8, avg(C9) as C9, avg(C10) as C10,
avg(C11) as C11, avg(C12) as C12, avg(C13) as C13, avg(C14) as C14, avg(C15)
as C15, avg(C16) as C16, avg(C17) as C17, avg(isnull(C18,0)) as C18, avg(C19)
as C19
from ClosedStudyResponses c
where c.StudyID = @StudyID
group by StudyID, RespID, LikingOrder
order by RespID
--We are setting @inquery aka InputDataSet to be our initial dataset.
--R Services requires that a data.frame be passed to any calculations being
generated. As such, df is simply data framing the @inquery data.
--The res object holds the polynomial regression results by RespondentID and
LikingOrder for each of the averages in the @inquery resultset.
EXEC sp_execute_external_script @language = N'R'
, @script = N'
studymeans <- InputDataSet
df <- data.frame(studymeans)
res1 <- lm(df$LikingOrder ~ polym(df$c1, df$c2, df$c3, df$c4, df$c5, df$c6, df$c7, df$c8, df$c9,
df$c10, df$c11, df$c12, df$c13, df$c14, df$c15, df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE))
res <- data.frame(res1)
'
, @input_data_1 = @inquery
, @output_data_1_name = N'res'
, @params = N'@StudyID int'
,@StudyID = @StudyID
--- Edit this line to handle the output data frame.
WITH RESULT SETS ((RespID int, res varchar(max)));
END;
以上存储过程在提供有效 StudyID 时出现以下错误:
Error in model.frame.default(formula = df$LikingOrder ~ polym(df$c1, df$c2,
:
variable lengths differ (found for 'polym(df$c1, df$c2, df$c3, df$c4, df$c5,
df$c6, df$c7, df$c8, df$c9, df$c10, df$c11, df$c12, df$c13, df$c14, df$c15,
df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE)')
Calls: source ... lm -> eval -> eval -> <Anonymous> -> model.frame.default
In addition: There were 19 warnings (use warnings() to see them)
这是 polym 的正确用法吗?如果不是,我如何实现计算 19 个独立回归的目标?最后,如何以编程方式确定每个回归的最高值?
根据评论中的问题和讨论,assumptions
做出的是:
RespID
:是categorical parameter
,未用于模型拟合StudyID
: 在示例数据中被忽略LinkingOrder
:是因变量,即response
(非分类)C1 to C19
:independent variables
为数值
[=56=的回归系数]Objective
:确定linear fit
到变量C1
到C19
Note
:未添加polynomial fit
,因为有问题的最终请求 table 似乎没有列出迭代项。Resource
:ISLR 中的第 3、5 章
创建样本数据框
StudyID <- rep(10001, 100)
RespID <- c(rep(117,25), rep(119,25), rep(120,25), rep(121,25))
LinkingOrder <- floor(runif(100, 1, 9))
df <- data.frame(StudyID, RespID, LinkingOrder)
# Create columns C1 to C19
for (i in c(1:19)){
vari <- paste("C", i, sep = "")
df[vari] <- floor(runif(100, 0, 9))
}
# Convert RespID to categorical variable
df$RespID <- as.factor(RespID)
拟合 lm() 并以 table 格式存储系数
注意:截距项包含在 table
# Fit lm() and store coefficients in a table
final_table <- data.frame()
for (respid in unique(df$RespID)){
data <- df[df['RespID']==respid, ]
data <- subset(data, select = -c(StudyID, RespID))
lm.fit <- lm(LinkingOrder ~ ., data=data)
# Save to table
final_table <- rbind(final_table, data.frame(t(unlist(lm.fit$coefficients))))
}