将 for 循环替换为 apply 以获得更快的应用程序

Replace for loop with apply for faster application

我有这个数据集,例如。以下

Node Descr Node1 Descr Node2 Descr
A1 AAA1 B1 BBB1 C1 CCC1
A2 AAA2 B2 BBB2 C2 CCC2
A3 AAA3 C3 CCC3
A4 AAA4 B4 BBB4 C4 CCC4

并且期望节点和描述的节点应该是空白的,并被来自同一行的前一个节点和描述替换为:

Node Descr Node1 Descr Node2 Descr
A1 AAA1 B1 BBB1 C1 CCC1
A2 AAA2 B2 BBB2 C2 CCC2
A3 AAA3 A3 AAA3 C3 CCC3
A4 AAA4 B4 BBB4 C4 CCC4
for (j in 8:20){
  for (i in 1:nrow(old_data)){
     if(is.na(old_data[i,j]) && !is.na(old_data[i,j+2]) && !is.na(old_data[i,j-2])){
       old_data[i,j] <- old_data[i,j-2]
       old_data[i,j+1] <- old_data[i,j-1]}
  }
}

现在我可以使用 for 循环来完成它,如下所示,但是由于我的数据很大,扫描数据框并修复它需要很长时间,我想知道是否有更快更精简的方法来使用应用家庭或任何其他建议。

使用 apply 循环不一定比 for 循环快,在某些情况下,还慢。但是您可以删除其中一个 for 循环并矢量化:

df <- data.frame(
    Node = c("A1", "A2", "A3", "A4"), 
    Descr = c("AAA1", "AAA2", "AAA3", "AAA4"), 
    Node1 = c("B1", "B2", NA, "B4"), 
    Descr1 = c("BBB1", "BBB2", NA, "BBB4"), 
    Node2 = c("C1", "C2", "C3", "C4"),
    Descr2 = c("CCC1", "CCC2", "CCC3", "CCC4"),
    Node3 = c(NA, "D2", "D3", "D4"),
    Descr3 = c(NA, "DDD2", "DDD3", "DDD4"),
    Node4 = c(NA, "E2", "E3", "E4"),
    Descr4 = c(NA, "EEE2", "EEE3", "EEE4")
)

for(i in seq(from = 3, to = ncol(df), by = 2)){
    # if the Descr column is not necessarily NA when its complementary Node 
    # column is NA, then you'll need to split this into two if-statements
    if(any(is.na(df[,i]))){
        df[,i][which(is.na(df[,i]))] <- df[,i-2][which(is.na(df[,i]))]
        df[,i+1][which(is.na(df[,i+1]))] <- df[,i-1][which(is.na(df[,i+1]))]
    }
}

df

  Node Descr Node1 Descr1 Node2 Descr2 Node3 Descr3 Node4 Descr4
1   A1  AAA1    B1   BBB1    C1   CCC1    C1   CCC1    C1   CCC1
2   A2  AAA2    B2   BBB2    C2   CCC2    D2   DDD2    E2   EEE2
3   A3  AAA3    A3   AAA3    C3   CCC3    D3   DDD3    E3   EEE3
4   A4  AAA4    B4   BBB4    C4   CCC4    D4   DDD4    E4   EEE4


如果你有很多行,这应该会快得多。