在矩阵的每个元素上测试条件
Testing a conditional over every element of a matrix
我需要评估矩阵中每个元素的条件,其中测试涉及每个索引的 dimnames
。具体来说,如果一个元素的 rowname
和 colname
共享一个特定的特征,我想将该元素设置为零。
m <- matrix(rnorm(81),nrow=9)
colnames(m) <- paste(c(rep("A",3),rep("B",3),rep("C",3)),1:ncol(m),sep = "_")
rownames(m) <- paste(c(rep("A",3),rep("B",3),rep("C",3)),1:nrow(m),sep = "_")
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 -0.03201198 -2.2241923 -0.65584334 -0.346745371 -0.9263060 -1.99181830 0.9138187 -2.4959751 -0.96723090
A_2 -1.44319826 -0.2225057 -1.35327091 -0.009194619 0.5798469 2.42753826 -1.4574564 -0.8858597 -1.41595891
A_3 -0.05863965 -0.2177708 -0.39131739 0.729532751 1.4106448 0.15899085 -1.7521345 0.5398222 -0.05073061
B_4 1.11006840 1.0315201 0.10434758 -0.508430234 -1.7095192 0.90913528 1.7367210 -0.9006098 -1.41698688
B_5 0.21405173 0.4735690 0.42655214 -0.367748304 0.9820261 -0.77933908 1.1326391 0.5316226 2.24951820
B_6 0.27153476 -0.3506076 0.16943749 -0.666135969 0.2962018 1.12236640 -1.3103133 -1.9494454 -0.57526358
C_7 1.69732641 -1.1439368 -0.02734925 -0.814635435 0.6658583 0.68069434 0.3330596 -1.2564933 0.15807742
C_8 0.35194835 -0.7075880 -0.45814046 0.773997223 -0.6530986 0.01295098 0.2557955 1.4658751 -3.33651509
C_9 0.58610083 0.7908394 -1.38909037 -0.742739398 -0.3745243 2.80990368 0.2172529 -0.3672324 0.56309688
我想要的结果可以通过 for 循环轻松实现,但在处理大型矩阵时,这当然会慢得令人望而却步。
for (i in 1:nrow(m)) {
for (j in 1:ncol(m)) {
m[i,j] <- ifelse(sub("_.*", "\1", rownames(m)[i])==sub("_.*", "\1", colnames(m)[j]),0,m[i,j])
}
}
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 0.0000000 0.0000000 0.00000000 -0.346745371 -0.9263060 -1.99181830 0.9138187 -2.4959751 -0.96723090
A_2 0.0000000 0.0000000 0.00000000 -0.009194619 0.5798469 2.42753826 -1.4574564 -0.8858597 -1.41595891
A_3 0.0000000 0.0000000 0.00000000 0.729532751 1.4106448 0.15899085 -1.7521345 0.5398222 -0.05073061
B_4 1.1100684 1.0315201 0.10434758 0.000000000 0.0000000 0.00000000 1.7367210 -0.9006098 -1.41698688
B_5 0.2140517 0.4735690 0.42655214 0.000000000 0.0000000 0.00000000 1.1326391 0.5316226 2.24951820
B_6 0.2715348 -0.3506076 0.16943749 0.000000000 0.0000000 0.00000000 -1.3103133 -1.9494454 -0.57526358
C_7 1.6973264 -1.1439368 -0.02734925 -0.814635435 0.6658583 0.68069434 0.0000000 0.0000000 0.00000000
C_8 0.3519484 -0.7075880 -0.45814046 0.773997223 -0.6530986 0.01295098 0.0000000 0.0000000 0.00000000
C_9 0.5861008 0.7908394 -1.38909037 -0.742739398 -0.3745243 2.80990368 0.0000000 0.0000000 0.00000000
概念上类似于 ,但我不知道如何在 apply
系列中评估 dimnames
的条件。有什么建议么?谢谢!
sub
已矢量化。唯一需要更改的是 rep
licate name 属性,使其与 matrix
的元素数量相同,根据逻辑向量提取这些元素并将其分配给 0
rnm <- sub("_.*", "\1", rownames(m))
cnm <- sub("_.*", "\1", colnames(m))
m[rnm[row(m)] == cnm[col(m)]] <- 0
-输出
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 0.0000000 0.00000000 0.00000000 -2.84612856 0.1611173 0.29004403 -0.8844186 0.8363131 -0.57395543
A_2 0.0000000 0.00000000 0.00000000 1.92166045 0.2671856 -0.50424582 0.8672366 -2.2496354 0.04046654
A_3 0.0000000 0.00000000 0.00000000 -0.01515657 0.6775447 -0.03862803 1.7764642 0.7146040 0.33652933
B_4 1.0842025 0.32981334 -0.61961179 0.00000000 0.0000000 0.00000000 0.3366096 0.5196087 1.67367867
B_5 -1.2897726 0.04082185 1.62661008 0.00000000 0.0000000 0.00000000 -1.6750488 0.5464289 0.98246881
B_6 -0.4213025 -0.70439232 0.09091241 0.00000000 0.0000000 0.00000000 0.7050487 -0.4151445 -0.04604658
C_7 1.0594241 -1.43071282 -0.75394573 1.08360040 1.3646551 -0.88687658 0.0000000 0.0000000 0.00000000
C_8 1.3908082 1.12093371 1.73690687 1.05202987 0.6152715 0.45601621 0.0000000 0.0000000 0.00000000
C_9 -0.3026523 1.14507291 -1.04714611 -2.38087279 -0.6976168 -0.96394113 0.0000000 0.0000000 0.00000000
或者另一种选择是重塑为 'long' 格式,根据 substr
条件将 'Freq' 列分配给 0,然后使用 xtabs
xtabs(Freq ~ Var1 + Var2, transform(as.data.frame.table(m),
Freq = Freq * (substr(Var1, 1, 1) != substr(Var2, 1, 1))))
另一个基本 R 选项使用 outer
> m * outer(gsub("_.*", "", row.names(m)), gsub("_.*", "", colnames(m)), "!=")
A_1 A_2 A_3 B_4 B_5 B_6
A_1 0.00000000 0.00000000 0.0000000 -0.9406041 -0.636132786 -1.6351854
A_2 0.00000000 0.00000000 0.0000000 2.5036185 1.097232977 1.0347660
A_3 0.00000000 0.00000000 0.0000000 0.9006587 0.923166200 0.3491107
B_4 1.02027130 0.83789740 0.7906466 0.0000000 0.000000000 0.0000000
B_5 0.07069471 -0.86514897 -1.2658111 0.0000000 0.000000000 0.0000000
B_6 1.19529695 -1.21846895 -0.1679168 0.0000000 0.000000000 0.0000000
C_7 0.14903965 0.05210262 -0.2998190 -0.6323805 -0.631320984 -0.5492681
C_8 -0.40173446 -0.88410976 -0.1568388 -0.1936692 -0.387362790 1.1529557
C_9 -1.01364291 -0.63857820 0.4172131 -1.1776165 -0.009321079 0.4942712
C_7 C_8 C_9
A_1 0.2739315 -1.5326245 1.0576526
A_2 -0.7770083 0.8391607 0.9051447
A_3 -0.7469622 0.3777556 -1.6092045
B_4 -0.7780711 -0.7269628 0.1827205
B_5 0.6065771 0.4299673 1.5414200
B_6 1.9553316 0.5131711 0.6540521
C_7 0.0000000 0.0000000 0.0000000
C_8 0.0000000 0.0000000 0.0000000
C_9 0.0000000 0.0000000 0.0000000
我需要评估矩阵中每个元素的条件,其中测试涉及每个索引的 dimnames
。具体来说,如果一个元素的 rowname
和 colname
共享一个特定的特征,我想将该元素设置为零。
m <- matrix(rnorm(81),nrow=9)
colnames(m) <- paste(c(rep("A",3),rep("B",3),rep("C",3)),1:ncol(m),sep = "_")
rownames(m) <- paste(c(rep("A",3),rep("B",3),rep("C",3)),1:nrow(m),sep = "_")
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 -0.03201198 -2.2241923 -0.65584334 -0.346745371 -0.9263060 -1.99181830 0.9138187 -2.4959751 -0.96723090
A_2 -1.44319826 -0.2225057 -1.35327091 -0.009194619 0.5798469 2.42753826 -1.4574564 -0.8858597 -1.41595891
A_3 -0.05863965 -0.2177708 -0.39131739 0.729532751 1.4106448 0.15899085 -1.7521345 0.5398222 -0.05073061
B_4 1.11006840 1.0315201 0.10434758 -0.508430234 -1.7095192 0.90913528 1.7367210 -0.9006098 -1.41698688
B_5 0.21405173 0.4735690 0.42655214 -0.367748304 0.9820261 -0.77933908 1.1326391 0.5316226 2.24951820
B_6 0.27153476 -0.3506076 0.16943749 -0.666135969 0.2962018 1.12236640 -1.3103133 -1.9494454 -0.57526358
C_7 1.69732641 -1.1439368 -0.02734925 -0.814635435 0.6658583 0.68069434 0.3330596 -1.2564933 0.15807742
C_8 0.35194835 -0.7075880 -0.45814046 0.773997223 -0.6530986 0.01295098 0.2557955 1.4658751 -3.33651509
C_9 0.58610083 0.7908394 -1.38909037 -0.742739398 -0.3745243 2.80990368 0.2172529 -0.3672324 0.56309688
我想要的结果可以通过 for 循环轻松实现,但在处理大型矩阵时,这当然会慢得令人望而却步。
for (i in 1:nrow(m)) {
for (j in 1:ncol(m)) {
m[i,j] <- ifelse(sub("_.*", "\1", rownames(m)[i])==sub("_.*", "\1", colnames(m)[j]),0,m[i,j])
}
}
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 0.0000000 0.0000000 0.00000000 -0.346745371 -0.9263060 -1.99181830 0.9138187 -2.4959751 -0.96723090
A_2 0.0000000 0.0000000 0.00000000 -0.009194619 0.5798469 2.42753826 -1.4574564 -0.8858597 -1.41595891
A_3 0.0000000 0.0000000 0.00000000 0.729532751 1.4106448 0.15899085 -1.7521345 0.5398222 -0.05073061
B_4 1.1100684 1.0315201 0.10434758 0.000000000 0.0000000 0.00000000 1.7367210 -0.9006098 -1.41698688
B_5 0.2140517 0.4735690 0.42655214 0.000000000 0.0000000 0.00000000 1.1326391 0.5316226 2.24951820
B_6 0.2715348 -0.3506076 0.16943749 0.000000000 0.0000000 0.00000000 -1.3103133 -1.9494454 -0.57526358
C_7 1.6973264 -1.1439368 -0.02734925 -0.814635435 0.6658583 0.68069434 0.0000000 0.0000000 0.00000000
C_8 0.3519484 -0.7075880 -0.45814046 0.773997223 -0.6530986 0.01295098 0.0000000 0.0000000 0.00000000
C_9 0.5861008 0.7908394 -1.38909037 -0.742739398 -0.3745243 2.80990368 0.0000000 0.0000000 0.00000000
概念上类似于 apply
系列中评估 dimnames
的条件。有什么建议么?谢谢!
sub
已矢量化。唯一需要更改的是 rep
licate name 属性,使其与 matrix
的元素数量相同,根据逻辑向量提取这些元素并将其分配给 0
rnm <- sub("_.*", "\1", rownames(m))
cnm <- sub("_.*", "\1", colnames(m))
m[rnm[row(m)] == cnm[col(m)]] <- 0
-输出
m
A_1 A_2 A_3 B_4 B_5 B_6 C_7 C_8 C_9
A_1 0.0000000 0.00000000 0.00000000 -2.84612856 0.1611173 0.29004403 -0.8844186 0.8363131 -0.57395543
A_2 0.0000000 0.00000000 0.00000000 1.92166045 0.2671856 -0.50424582 0.8672366 -2.2496354 0.04046654
A_3 0.0000000 0.00000000 0.00000000 -0.01515657 0.6775447 -0.03862803 1.7764642 0.7146040 0.33652933
B_4 1.0842025 0.32981334 -0.61961179 0.00000000 0.0000000 0.00000000 0.3366096 0.5196087 1.67367867
B_5 -1.2897726 0.04082185 1.62661008 0.00000000 0.0000000 0.00000000 -1.6750488 0.5464289 0.98246881
B_6 -0.4213025 -0.70439232 0.09091241 0.00000000 0.0000000 0.00000000 0.7050487 -0.4151445 -0.04604658
C_7 1.0594241 -1.43071282 -0.75394573 1.08360040 1.3646551 -0.88687658 0.0000000 0.0000000 0.00000000
C_8 1.3908082 1.12093371 1.73690687 1.05202987 0.6152715 0.45601621 0.0000000 0.0000000 0.00000000
C_9 -0.3026523 1.14507291 -1.04714611 -2.38087279 -0.6976168 -0.96394113 0.0000000 0.0000000 0.00000000
或者另一种选择是重塑为 'long' 格式,根据 substr
条件将 'Freq' 列分配给 0,然后使用 xtabs
xtabs(Freq ~ Var1 + Var2, transform(as.data.frame.table(m),
Freq = Freq * (substr(Var1, 1, 1) != substr(Var2, 1, 1))))
另一个基本 R 选项使用 outer
> m * outer(gsub("_.*", "", row.names(m)), gsub("_.*", "", colnames(m)), "!=")
A_1 A_2 A_3 B_4 B_5 B_6
A_1 0.00000000 0.00000000 0.0000000 -0.9406041 -0.636132786 -1.6351854
A_2 0.00000000 0.00000000 0.0000000 2.5036185 1.097232977 1.0347660
A_3 0.00000000 0.00000000 0.0000000 0.9006587 0.923166200 0.3491107
B_4 1.02027130 0.83789740 0.7906466 0.0000000 0.000000000 0.0000000
B_5 0.07069471 -0.86514897 -1.2658111 0.0000000 0.000000000 0.0000000
B_6 1.19529695 -1.21846895 -0.1679168 0.0000000 0.000000000 0.0000000
C_7 0.14903965 0.05210262 -0.2998190 -0.6323805 -0.631320984 -0.5492681
C_8 -0.40173446 -0.88410976 -0.1568388 -0.1936692 -0.387362790 1.1529557
C_9 -1.01364291 -0.63857820 0.4172131 -1.1776165 -0.009321079 0.4942712
C_7 C_8 C_9
A_1 0.2739315 -1.5326245 1.0576526
A_2 -0.7770083 0.8391607 0.9051447
A_3 -0.7469622 0.3777556 -1.6092045
B_4 -0.7780711 -0.7269628 0.1827205
B_5 0.6065771 0.4299673 1.5414200
B_6 1.9553316 0.5131711 0.6540521
C_7 0.0000000 0.0000000 0.0000000
C_8 0.0000000 0.0000000 0.0000000
C_9 0.0000000 0.0000000 0.0000000