线性模型回归系数的相关矩阵
Correlation matrix for linear model regression coefficient
使用 cor(mtcars, method='pearson')
生成一个矩阵,显示 mtcars
中所有变量与 mtcars
中所有其他变量的皮尔逊相关性。例如:
head(cor(mtcars, method='pearson'))
mpg cyl disp hp drat wt qsec vs am gear
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.41868403 0.6640389 0.5998324 0.4802848
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.6999381 0.7824958 -0.59124207 -0.8108118 -0.5226070 -0.4926866
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.7102139 0.8879799 -0.43369788 -0.7104159 -0.5912270 -0.5555692
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.4487591 0.6587479 -0.70822339 -0.7230967 -0.2432043 -0.1257043
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.0000000 -0.7124406 0.09120476 0.4402785 0.7127111 0.6996101
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.7124406 1.0000000 -0.17471588 -0.5549157 -0.6924953 -0.5832870
carb
mpg -0.5509251
cyl 0.5269883
disp 0.3949769
hp 0.7498125
drat -0.0907898
wt 0.4276059
除了每个值不是每个变量之间的皮尔逊相关,而是来自线性模型的 r.squared
值之外,我如何才能得到与上面相同的矩阵?因此,例如第一列,第二行将与 summary(lm(mtcars$mpg~ mtcars$cyl))$r.squared
相同。谢谢
library(tidyverse)
# kepp names of dataset
names = names(mtcars)
expand.grid(names, names, stringsAsFactors = F) %>% # create pairs of names
filter(Var1 != Var2) %>% # exclude same variables (creates warnings)
rowwise() %>% # for each row
mutate(r = summary(lm(paste(Var1, "~" ,Var2), data = mtcars))$r.squared) %>% # get the r squared
spread(Var2, r) # reshape
# # A tibble: 11 x 12
# Var1 am carb cyl disp drat gear hp mpg
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 am NA 0.00331 0.273 0.350 0.508 0.631 0.0591 0.360
# 2 carb 0.00331 NA 0.278 0.156 0.00824 0.0751 0.562 0.304
# 3 cyl 0.273 0.278 NA 0.814 0.490 0.243 0.693 0.726
# 4 disp 0.350 0.156 0.814 NA 0.504 0.309 0.626 0.718
# 5 drat 0.508 0.00824 0.490 0.504 NA 0.489 0.201 0.464
# 6 gear 0.631 0.0751 0.243 0.309 0.489 NA 0.0158 0.231
# 7 hp 0.0591 0.562 0.693 0.626 0.201 0.0158 NA 0.602
# 8 mpg 0.360 0.304 0.726 0.718 0.464 0.231 0.602 NA
# 9 qsec 0.0528 0.431 0.350 0.188 0.00832 0.0452 0.502 0.175
# 10 vs 0.0283 0.324 0.657 0.505 0.194 0.0424 0.523 0.441
# 11 wt 0.480 0.183 0.612 0.789 0.508 0.340 0.434 0.753
# # ... with 3 more variables: qsec <dbl>, vs <dbl>, wt <dbl>
如果您想要行名称而不是第一列 (Var1),您可以在上面管道的末尾添加
... %>%
data.frame() %>%
column_to_rownames("Var1")
这将更接近您从 cor(mtcars, method='pearson')
获得的输出
我创建了一个 corlm 函数,它用 for 循环填充条目
corlm <- function(df){
mat <- matrix(NA, ncol(df), ncol(df), dimnames = list(colnames(df),colnames(df)))
suppressWarnings(for(i in 1:ncol(df)){
for(j in 1:ncol(df)){
mat[i,j] = summary(lm(df[,j] ~ df[,i]))$r.squared}})
diag(mat) = NA; return(mat)
}
round(corlm(mtcars),3)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg NA 0.726 0.718 0.602 0.464 0.753 0.175 0.441 0.360 0.231 0.304
cyl 0.726 NA 0.814 0.693 0.490 0.612 0.350 0.657 0.273 0.243 0.278
disp 0.718 0.814 NA 0.626 0.504 0.789 0.188 0.505 0.350 0.309 0.156
hp 0.602 0.693 0.626 NA 0.201 0.434 0.502 0.523 0.059 0.016 0.562
drat 0.464 0.490 0.504 0.201 NA 0.508 0.008 0.194 0.508 0.489 0.008
wt 0.753 0.612 0.789 0.434 0.508 NA 0.031 0.308 0.480 0.340 0.183
qsec 0.175 0.350 0.188 0.502 0.008 0.031 NA 0.554 0.053 0.045 0.431
vs 0.441 0.657 0.505 0.523 0.194 0.308 0.554 NA 0.028 0.042 0.324
am 0.360 0.273 0.350 0.059 0.508 0.480 0.053 0.028 NA 0.631 0.003
gear 0.231 0.243 0.309 0.016 0.489 0.340 0.045 0.042 0.631 NA 0.075
carb 0.304 0.278 0.156 0.562 0.008 0.183 0.431 0.324 0.003 0.075 NA
使用 cor(mtcars, method='pearson')
生成一个矩阵,显示 mtcars
中所有变量与 mtcars
中所有其他变量的皮尔逊相关性。例如:
head(cor(mtcars, method='pearson'))
mpg cyl disp hp drat wt qsec vs am gear
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.41868403 0.6640389 0.5998324 0.4802848
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.6999381 0.7824958 -0.59124207 -0.8108118 -0.5226070 -0.4926866
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.7102139 0.8879799 -0.43369788 -0.7104159 -0.5912270 -0.5555692
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.4487591 0.6587479 -0.70822339 -0.7230967 -0.2432043 -0.1257043
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.0000000 -0.7124406 0.09120476 0.4402785 0.7127111 0.6996101
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.7124406 1.0000000 -0.17471588 -0.5549157 -0.6924953 -0.5832870
carb
mpg -0.5509251
cyl 0.5269883
disp 0.3949769
hp 0.7498125
drat -0.0907898
wt 0.4276059
除了每个值不是每个变量之间的皮尔逊相关,而是来自线性模型的 r.squared
值之外,我如何才能得到与上面相同的矩阵?因此,例如第一列,第二行将与 summary(lm(mtcars$mpg~ mtcars$cyl))$r.squared
相同。谢谢
library(tidyverse)
# kepp names of dataset
names = names(mtcars)
expand.grid(names, names, stringsAsFactors = F) %>% # create pairs of names
filter(Var1 != Var2) %>% # exclude same variables (creates warnings)
rowwise() %>% # for each row
mutate(r = summary(lm(paste(Var1, "~" ,Var2), data = mtcars))$r.squared) %>% # get the r squared
spread(Var2, r) # reshape
# # A tibble: 11 x 12
# Var1 am carb cyl disp drat gear hp mpg
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 am NA 0.00331 0.273 0.350 0.508 0.631 0.0591 0.360
# 2 carb 0.00331 NA 0.278 0.156 0.00824 0.0751 0.562 0.304
# 3 cyl 0.273 0.278 NA 0.814 0.490 0.243 0.693 0.726
# 4 disp 0.350 0.156 0.814 NA 0.504 0.309 0.626 0.718
# 5 drat 0.508 0.00824 0.490 0.504 NA 0.489 0.201 0.464
# 6 gear 0.631 0.0751 0.243 0.309 0.489 NA 0.0158 0.231
# 7 hp 0.0591 0.562 0.693 0.626 0.201 0.0158 NA 0.602
# 8 mpg 0.360 0.304 0.726 0.718 0.464 0.231 0.602 NA
# 9 qsec 0.0528 0.431 0.350 0.188 0.00832 0.0452 0.502 0.175
# 10 vs 0.0283 0.324 0.657 0.505 0.194 0.0424 0.523 0.441
# 11 wt 0.480 0.183 0.612 0.789 0.508 0.340 0.434 0.753
# # ... with 3 more variables: qsec <dbl>, vs <dbl>, wt <dbl>
如果您想要行名称而不是第一列 (Var1),您可以在上面管道的末尾添加
... %>%
data.frame() %>%
column_to_rownames("Var1")
这将更接近您从 cor(mtcars, method='pearson')
我创建了一个 corlm 函数,它用 for 循环填充条目
corlm <- function(df){
mat <- matrix(NA, ncol(df), ncol(df), dimnames = list(colnames(df),colnames(df)))
suppressWarnings(for(i in 1:ncol(df)){
for(j in 1:ncol(df)){
mat[i,j] = summary(lm(df[,j] ~ df[,i]))$r.squared}})
diag(mat) = NA; return(mat)
}
round(corlm(mtcars),3)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg NA 0.726 0.718 0.602 0.464 0.753 0.175 0.441 0.360 0.231 0.304
cyl 0.726 NA 0.814 0.693 0.490 0.612 0.350 0.657 0.273 0.243 0.278
disp 0.718 0.814 NA 0.626 0.504 0.789 0.188 0.505 0.350 0.309 0.156
hp 0.602 0.693 0.626 NA 0.201 0.434 0.502 0.523 0.059 0.016 0.562
drat 0.464 0.490 0.504 0.201 NA 0.508 0.008 0.194 0.508 0.489 0.008
wt 0.753 0.612 0.789 0.434 0.508 NA 0.031 0.308 0.480 0.340 0.183
qsec 0.175 0.350 0.188 0.502 0.008 0.031 NA 0.554 0.053 0.045 0.431
vs 0.441 0.657 0.505 0.523 0.194 0.308 0.554 NA 0.028 0.042 0.324
am 0.360 0.273 0.350 0.059 0.508 0.480 0.053 0.028 NA 0.631 0.003
gear 0.231 0.243 0.309 0.016 0.489 0.340 0.045 0.042 0.631 NA 0.075
carb 0.304 0.278 0.156 0.562 0.008 0.183 0.431 0.324 0.003 0.075 NA