线性回归中按组的 r 平方
r-squared by groups in linear regression
我已经使用数据集 (24) 的所有元素计算了线性回归,得到的模型是 IP2。现在我想知道单个模型对数据集中每个国家的拟合程度(r 平方,我对斜率和截距不感兴趣)。糟糕的做法是(我需要做以下 200 次)
Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
summary(lm(IP[Country=="A"] ~ IP2[Country=="A"]))
summary(lm(IP[Country=="B"] ~ IP2[Country=="B"]))
有没有办法同时计算两个 r 平方?我尝试使用 Linear Regression and group by in R as well as some others posts (Fitting several regression models with dplyr),但它没有用,我得到了与我一起工作的四个组的相同系数。
关于我做错了什么或如何解决问题的任何想法?
谢谢
几个以 R 为基数的选项:
sapply(unique(Country), function(cn)
summary(lm(IP[Country == cn] ~ IP2[Country == cn]))$r.sq)
# A B
# 0.9451881 0.9496636
和
c(by(data.frame(IP, IP2), Country, function(x) summary(lm(x))$r.sq))
# A B
# 0.9451881 0.9496636
或
sapply(split(data.frame(IP, IP2), Country), function(x) summary(lm(x))$r.sq)
# A B
# 0.9451881 0.9496636
您可以使用 split
函数然后 mapply
来完成此操作。
split
获取一个向量并将其转换为包含 k 个元素的列表,其中 k 是(在本例中)国家/地区的不同级别。
mapply
允许我们循环多个输入。
getR2
是一个简单的函数,它接受两个输入,拟合模型,然后提取 R^2 值。
下面的代码示例
Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
ip_split = split(IP,Country)
ip2_split = split(IP2,Country)
getR2 = function(ip,ip2){
model = lm(ip~ip2)
return(summary(model)$r.squared)
}
r2.values = mapply(getR2,ip_split,ip2_split)
r2.values
#> A B
#> 0.9451881 0.9496636
我已经使用数据集 (24) 的所有元素计算了线性回归,得到的模型是 IP2。现在我想知道单个模型对数据集中每个国家的拟合程度(r 平方,我对斜率和截距不感兴趣)。糟糕的做法是(我需要做以下 200 次)
Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
summary(lm(IP[Country=="A"] ~ IP2[Country=="A"]))
summary(lm(IP[Country=="B"] ~ IP2[Country=="B"]))
有没有办法同时计算两个 r 平方?我尝试使用 Linear Regression and group by in R as well as some others posts (Fitting several regression models with dplyr),但它没有用,我得到了与我一起工作的四个组的相同系数。 关于我做错了什么或如何解决问题的任何想法? 谢谢
几个以 R 为基数的选项:
sapply(unique(Country), function(cn)
summary(lm(IP[Country == cn] ~ IP2[Country == cn]))$r.sq)
# A B
# 0.9451881 0.9496636
和
c(by(data.frame(IP, IP2), Country, function(x) summary(lm(x))$r.sq))
# A B
# 0.9451881 0.9496636
或
sapply(split(data.frame(IP, IP2), Country), function(x) summary(lm(x))$r.sq)
# A B
# 0.9451881 0.9496636
您可以使用 split
函数然后 mapply
来完成此操作。
split
获取一个向量并将其转换为包含 k 个元素的列表,其中 k 是(在本例中)国家/地区的不同级别。mapply
允许我们循环多个输入。getR2
是一个简单的函数,它接受两个输入,拟合模型,然后提取 R^2 值。
下面的代码示例
Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
ip_split = split(IP,Country)
ip2_split = split(IP2,Country)
getR2 = function(ip,ip2){
model = lm(ip~ip2)
return(summary(model)$r.squared)
}
r2.values = mapply(getR2,ip_split,ip2_split)
r2.values
#> A B
#> 0.9451881 0.9496636