如何使用 R 求解多元多项式回归中的最高值(回归峰值)?

How to solve for highest value (regression peak) in multivariate polynomial regression using R?

我的 SQL Server 2017 数据库中有一个 table,其中包含以下部分数据:

我的目的是为 19 列中的每一列创建多元多项式回归,其中 LikingOrder 是我的因变量,给定 RespID 的 19 列值中的每一列都是自变量。

最终结果应该是每个 RespID 从 C1 到 C19 的每一列的最高回归值。最终结果应如下所示:

我已阅读有关 polym 的内容并尝试在以下脚本中使用它:

ALTER PROCEDURE [dbo].[spRegressionPeak]   
@StudyID int
AS
BEGIN
Declare @sStudyID VARCHAR(50)
Set @sStudyID = CONVERT(VARCHAR(50),@StudyID)

--We use IsNull values to pass zeroes where an average wasn't calculated os 
that the polynomial regression can be calculated.
DECLARE @inquery  AS NVARCHAR(MAX) = '
    Select
c.StudyID, c.RespID, c.LikingOrder, avg(C1) as C1, avg(C2) as C2, avg(C3) as 
C3, avg(C4) as C4, avg(C5) as C5, avg(C6) as C6, avg(C7) as C7, avg(C8) as 
C8, avg(C9) as C9, avg(C10) as C10,
avg(C11) as C11, avg(C12) as C12, avg(C13) as C13, avg(C14) as C14, avg(C15) 
as C15, avg(C16) as C16, avg(C17) as C17, avg(isnull(C18,0)) as C18, avg(C19) 
as C19
from ClosedStudyResponses c
where c.StudyID = @StudyID
group by StudyID, RespID, LikingOrder
order by RespID 

--We are setting @inquery aka InputDataSet to be our initial dataset.  
--R Services requires that a data.frame be passed to any calculations being 
generated.  As such, df is simply data framing the @inquery data.
--The res object holds the polynomial regression results by RespondentID and 
LikingOrder for each of the averages in the @inquery resultset.
EXEC sp_execute_external_script @language = N'R'
, @script = N'
    studymeans <- InputDataSet

    df <- data.frame(studymeans) 

    res1 <- lm(df$LikingOrder ~ polym(df$c1, df$c2, df$c3, df$c4, df$c5, df$c6, df$c7, df$c8, df$c9, 
    df$c10, df$c11, df$c12, df$c13, df$c14, df$c15, df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE)) 
    res <- data.frame(res1)

'
, @input_data_1 = @inquery
, @output_data_1_name = N'res'
, @params = N'@StudyID int'
,@StudyID = @StudyID 
--- Edit this line to handle the output data frame.
WITH RESULT SETS ((RespID int, res varchar(max)));
END;

以上存储过程在提供有效 StudyID 时出现以下错误:

Error in model.frame.default(formula = df$LikingOrder ~ polym(df$c1, df$c2,  
: 
variable lengths differ (found for 'polym(df$c1, df$c2, df$c3, df$c4, df$c5, 
df$c6, df$c7, df$c8, df$c9, df$c10, df$c11, df$c12, df$c13, df$c14, df$c15, 
df$c16, df$c17, df$c18, df$c19, degree = 1, raw = TRUE)')
Calls: source ... lm -> eval -> eval -> <Anonymous> -> model.frame.default
In addition: There were 19 warnings (use warnings() to see them)

这是 polym 的正确用法吗?如果不是,我如何实现计算 19 个独立回归的目标?最后,如何以编程方式确定每个回归的最高值?

根据评论中的问题和讨论,assumptions做出的是:

  • RespID:是categorical parameter,未用于模型拟合
  • StudyID: 在示例数据中被忽略
  • LinkingOrder:是因变量,即 response(非分类)
  • C1 to C19: independent variables为数值

  • Objective:确定linear fit到变量C1C19

    [=56=的回归系数]
  • Note:未添加 polynomial fit,因为有问题的最终请求 table 似乎没有列出迭代项。
  • ResourceISLR
  • 中的第 3、5 章

创建样本数据框

StudyID <- rep(10001, 100)
RespID <- c(rep(117,25), rep(119,25), rep(120,25), rep(121,25))
LinkingOrder <- floor(runif(100, 1, 9))
df <- data.frame(StudyID, RespID, LinkingOrder)
# Create columns C1 to C19
for (i in c(1:19)){
  vari <- paste("C", i, sep = "")
  df[vari] <-  floor(runif(100, 0, 9))
}

# Convert RespID to categorical variable
df$RespID <- as.factor(RespID)

拟合 lm() 并以 table 格式存储系数

注意:截距项包含在 table

# Fit lm() and store coefficients in a table
final_table <- data.frame()
for (respid in unique(df$RespID)){
  data <- df[df['RespID']==respid, ]
  data <- subset(data, select = -c(StudyID, RespID))

  lm.fit <- lm(LinkingOrder ~ ., data=data)

  # Save to table
  final_table <- rbind(final_table, data.frame(t(unlist(lm.fit$coefficients))))
}