将两个关联表合并成一个对称的命名矩阵

Merge two correlation tables into a symmetrical named matrix

早上好,

我正在寻找一种方法,以编程方式将两个不同的数据帧(具有相关系数)转换为一个对称的命名矩阵,其中一个数据帧的值位于上三角,另一个数据帧的数据位于下三角。

取两个相关的 tables:

t1<-structure(list(var1 = c("SE", "SE", "CN", "GN", "CN", "CN"), 
                   var2 = c("VN", "GN", "SE", "VN", "VN", "GN"), cor = c("-0.42***", 
                                                                         "0.16***", "-0.21***", "0.1**", "0.35***", "0.07*")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                   -6L))

t2<-structure(list(var1 = c("SE", "SE", "SE", "SE", "VN", "VN", "VN", 
                            "GN", "GN", "CN"), var2 = c("VN", "GN", "CN", "IS", "GN", "CN", 
                                                        "IS", "CN", "IS", "IS"), cor = c("-0.41***", "0.14***", "-0.02", 
                                                                                         "0.28***", "0.1**", "0.28***", "-0.02", "0.03", "-0.06†", "0.53***"
                                                        )), class = "data.frame", row.names = c(NA, -10L))

t1X=4 个变量的相关性 table:

  var1 var2      cor
1   SE   VN -0.42***
2   SE   GN  0.16***
3   CN   SE -0.21***
4   GN   VN    0.1**
5   CN   VN  0.35***
6   CN   GN    0.07*

t2 是 table 与 X + 1 变量的相关性(与 X 中的名称相同 + 1 个)

   var1 var2      cor
1    SE   VN -0.41***
2    SE   GN  0.14***
3    SE   CN    -0.02
4    SE   IS  0.28***
5    VN   GN    0.1**
6    VN   CN  0.28***
7    VN   IS    -0.02
8    GN   CN     0.03
9    GN   IS   -0.06†
10   CN   IS  0.53***

我想:

我知道我可以手动完成,但我有许多数据集需要处理,大小不一的矩阵(和名称)和 table 像上面应该报告的那样。

我的手动解决方案如下:

mat <- matrix(NA, 5,4, dimnames = list(c("SE", "VN", "GN", "CN", "IS"),
                                       c("SE", "VN", "GN", "CN")))
mat[lower.tri(mat)] <- t2$cor
mat[upper.tri(mat)] <- t1$cor
diag(mat) <- "-"
mat
   SE         VN         GN         CN       
SE "-"        "-0.42***" "0.16***"  "0.1**"  
VN "-0.41***" "-"        "-0.21***" "0.35***"
GN "0.14***"  "0.1**"    "-"        "0.07*"  
CN "-0.02"    "0.28***"  "0.03"     "-"      
IS "0.28***"  "-0.02"    "-0.06†"   "0.53***"

看来您唯一需要动态化的部分就是名称。这可以按如下方式完成,

unique(c(t(t1[1:2])))
#[1] "SE" "VN" "GN" "CN"
unique(c(t(t2[1:2])))
#[1] "SE" "VN" "GN" "CN" "IS"

所以要使它成为一个广义函数,

f1 <- function(df1, df2) {
    col <- unique(c(t(df1[1:2])))
    rn <- unique(c(t(df2[1:2])))
    mat <- matrix(NA, length(rn), length(col), dimnames = list(rn, col))
    mat[lower.tri(mat)] <- df2$cor
    mat[upper.tri(mat)] <- df1$cor
    diag(mat) <- "-"
    return(mat)
}

f1(t1, t2)
#   SE         VN         GN         CN       
#SE "-"        "-0.42***" "0.16***"  "0.1**"  
#VN "-0.41***" "-"        "-0.21***" "0.35***"
#GN "0.14***"  "0.1**"    "-"        "0.07*"  
#CN "-0.02"    "0.28***"  "0.03"     "-"      
#IS "0.28***"  "-0.02"    "-0.06†"   "0.53***"