使用 R 在矩阵中逐行进行聚合，例如 tapply

Question

我在做矩阵计算时遇到了问题，请你解释一下。非常感谢您！

我有一个数据框genderLocation和一个矩阵测试，它们与索引相互对应

genderLocation[,1:6]

          scanner_gender cmall_gender wechat_gender scanner_location cmall_location wechat_location
    156043              3            2             2             Guangzhou           Shenzhen            Shenzhen
    156044              2           NA            NA             Shenzhen           <NA>                
    156045              2           NA             2             Shenzhen           <NA>            Hongkong
    156046              2           NA             2             Shenzhen           <NA>            Shenzhen

test

        [,1] [,2] [,3] [,4] [,5] [,6]
    [1,]  0.8  0.7  0.6  0.6  0.7  0.7
    [2,]  0.8  1.0  1.0  0.6  0.7  0.7
    [3,]  0.8  1.0  0.6  0.6  0.7  0.7
    [4,]  0.8  1.0  0.6  0.6  0.7  0.7

现在我想聚合genderLocation，在矩阵测试中计算它们对应数字的平均值。以156043行为例，结果应该是

      2    3 Guangzhou Shenzhen 
    0.65 0.80 0.60 0.70

我不知道如何使用 apply 系列（因为不建议在 R 中使用 for 循环）。这似乎是

    > apply(test,1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation)

但是我看不懂结果，如果限制在前两行，似乎可以理解。

    > apply(test[1:2,],1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation[1:2,])
           c("2", NA)       c("3", "2") c("广州", "深圳")     c("深圳", "")     c("深圳", NA) 
                 0.65              0.80              0.60              0.70              0.70 
           c("2", NA)       c("3", "2") c("广州", "深圳")     c("深圳", "")     c("深圳", NA) 
                  1.0               0.8               0.6               0.7               0.7 
                      [,1] [,2]
    c("2", NA)        0.65  1.0
    c("3", "2")       0.80  0.8
    c("广州", "深圳") 0.60  0.6
    c("深圳", "")     0.70  0.7
    c("深圳", NA)     0.70  0.7

＃＃＃＃＃供参考

    test=matrix(c(0.8,0.8,0.8,0.8, 0.7,1,1,1, 0.6,1,0.6,0.6, 0.6,0.6,0.6,0.6, 0.7,0.7,0.7,0.7, 0.7,0.7,0.7,0.7),nrow=4,ncol=6,byrow=F)
    genderLocation<- data.frame(scanner_gender=c(3,2,2,2),cmall_gender=c(2,NA,NA,NA),wechat_gender=c(2,NA,2,2),
                                 scanner_location=c("Guangzhou","Shenzhen","Shenzhen","Shenzhen"),
                                 cmall_location=c("Shenzhen",NA,NA,NA),
                                 wechat_location=c("Shenzhen","","Hongkong","Shenzhen"))
    genderLocation1<-cbind(genderLocation,test)  # binded for some apply functions only accepting one input.

Answer 1

以下适用于您的示例数据但我不知道它对您所有数据的稳定性如何。如果 df 中的某些行与其他行不共享公共值，则可能会出现问题。但是，如果您想将输出保留为列表，这应该没有问题（即跳过 Reduce...）。牢记这一点...

--你的数据--

test <- matrix(c(0.8,0.8,0.8,0.8,0.7,1,1,1,0.6,1,0.6,0.6,0.6,0.6,0.6,0.6,rep(0.7,8)), nrow=4)

df <- data.frame(scanner_gender=c(3,2,2,2),
             cmall_gender=c(2,NA,NA,NA),
             wechat_location=c(2,NA,2,2),
             scanner_location=c("Guanzhou","Shenzhen","Shenzhen","Shenzhen"),
             cmall_location=c("Shenzhen",NA,NA,NA),
             wechat_location=c("Shenzhen",NA,"Hongkong","Shenzhen"),
             stringsAsFactors=F)
rownames(df) <- c(156043,156044,156045,156046)

--运算--

我将 purrr 中的 map 与其他 tidyverse 动词组合到 1) 创建一个包含 [=14 的 2 列数据框=] row-entry 在第一列和 test row-entry 在第二列，2) 然后 filter 在 is.na(A)==T , 3) 然后按组汇总 mean, 4) 然后 spread 使用 A（键）作为列

L <- map(1:nrow(df),~data.frame(A=unlist(df[.x,]),B=unlist(test[.x,])) %>% 
              filter(!is.na(A)) %>%
              group_by(A) %>%
              summarise(B=mean(B)) %>%
              spread(A,B) )

然后我使用 Reduce 和 full_join

将这个列表缩减为一个数据框

newdf <- Reduce("full_join", L)

--输出--

    `2`   `3` Guanzhou Shenzhen Hongkong
1  0.65   0.8      0.6     0.70       NA
2  0.80    NA       NA     0.60       NA
3  0.70    NA       NA     0.60      0.7
4  0.70    NA       NA     0.65       NA

使用 R 在矩阵中逐行进行聚合，例如 tapply

Using R to do aggregation like tapply in matrice rowwisely

r

apply

rowwise