将 for 循环替换为 apply 以获得更快的应用程序
Replace for loop with apply for faster application
我有这个数据集,例如。以下
Node
Descr
Node1
Descr
Node2
Descr
A1
AAA1
B1
BBB1
C1
CCC1
A2
AAA2
B2
BBB2
C2
CCC2
A3
AAA3
C3
CCC3
A4
AAA4
B4
BBB4
C4
CCC4
并且期望节点和描述的节点应该是空白的,并被来自同一行的前一个节点和描述替换为:
Node
Descr
Node1
Descr
Node2
Descr
A1
AAA1
B1
BBB1
C1
CCC1
A2
AAA2
B2
BBB2
C2
CCC2
A3
AAA3
A3
AAA3
C3
CCC3
A4
AAA4
B4
BBB4
C4
CCC4
for (j in 8:20){
for (i in 1:nrow(old_data)){
if(is.na(old_data[i,j]) && !is.na(old_data[i,j+2]) && !is.na(old_data[i,j-2])){
old_data[i,j] <- old_data[i,j-2]
old_data[i,j+1] <- old_data[i,j-1]}
}
}
现在我可以使用 for 循环来完成它,如下所示,但是由于我的数据很大,扫描数据框并修复它需要很长时间,我想知道是否有更快更精简的方法来使用应用家庭或任何其他建议。
使用 apply 循环不一定比 for 循环快,在某些情况下,还慢。但是您可以删除其中一个 for 循环并矢量化:
df <- data.frame(
Node = c("A1", "A2", "A3", "A4"),
Descr = c("AAA1", "AAA2", "AAA3", "AAA4"),
Node1 = c("B1", "B2", NA, "B4"),
Descr1 = c("BBB1", "BBB2", NA, "BBB4"),
Node2 = c("C1", "C2", "C3", "C4"),
Descr2 = c("CCC1", "CCC2", "CCC3", "CCC4"),
Node3 = c(NA, "D2", "D3", "D4"),
Descr3 = c(NA, "DDD2", "DDD3", "DDD4"),
Node4 = c(NA, "E2", "E3", "E4"),
Descr4 = c(NA, "EEE2", "EEE3", "EEE4")
)
for(i in seq(from = 3, to = ncol(df), by = 2)){
# if the Descr column is not necessarily NA when its complementary Node
# column is NA, then you'll need to split this into two if-statements
if(any(is.na(df[,i]))){
df[,i][which(is.na(df[,i]))] <- df[,i-2][which(is.na(df[,i]))]
df[,i+1][which(is.na(df[,i+1]))] <- df[,i-1][which(is.na(df[,i+1]))]
}
}
df
Node Descr Node1 Descr1 Node2 Descr2 Node3 Descr3 Node4 Descr4
1 A1 AAA1 B1 BBB1 C1 CCC1 C1 CCC1 C1 CCC1
2 A2 AAA2 B2 BBB2 C2 CCC2 D2 DDD2 E2 EEE2
3 A3 AAA3 A3 AAA3 C3 CCC3 D3 DDD3 E3 EEE3
4 A4 AAA4 B4 BBB4 C4 CCC4 D4 DDD4 E4 EEE4
如果你有很多行,这应该会快得多。
我有这个数据集,例如。以下
Node | Descr | Node1 | Descr | Node2 | Descr |
---|---|---|---|---|---|
A1 | AAA1 | B1 | BBB1 | C1 | CCC1 |
A2 | AAA2 | B2 | BBB2 | C2 | CCC2 |
A3 | AAA3 | C3 | CCC3 | ||
A4 | AAA4 | B4 | BBB4 | C4 | CCC4 |
并且期望节点和描述的节点应该是空白的,并被来自同一行的前一个节点和描述替换为:
Node | Descr | Node1 | Descr | Node2 | Descr |
---|---|---|---|---|---|
A1 | AAA1 | B1 | BBB1 | C1 | CCC1 |
A2 | AAA2 | B2 | BBB2 | C2 | CCC2 |
A3 | AAA3 | A3 | AAA3 | C3 | CCC3 |
A4 | AAA4 | B4 | BBB4 | C4 | CCC4 |
for (j in 8:20){
for (i in 1:nrow(old_data)){
if(is.na(old_data[i,j]) && !is.na(old_data[i,j+2]) && !is.na(old_data[i,j-2])){
old_data[i,j] <- old_data[i,j-2]
old_data[i,j+1] <- old_data[i,j-1]}
}
}
现在我可以使用 for 循环来完成它,如下所示,但是由于我的数据很大,扫描数据框并修复它需要很长时间,我想知道是否有更快更精简的方法来使用应用家庭或任何其他建议。
使用 apply 循环不一定比 for 循环快,在某些情况下,还慢。但是您可以删除其中一个 for 循环并矢量化:
df <- data.frame(
Node = c("A1", "A2", "A3", "A4"),
Descr = c("AAA1", "AAA2", "AAA3", "AAA4"),
Node1 = c("B1", "B2", NA, "B4"),
Descr1 = c("BBB1", "BBB2", NA, "BBB4"),
Node2 = c("C1", "C2", "C3", "C4"),
Descr2 = c("CCC1", "CCC2", "CCC3", "CCC4"),
Node3 = c(NA, "D2", "D3", "D4"),
Descr3 = c(NA, "DDD2", "DDD3", "DDD4"),
Node4 = c(NA, "E2", "E3", "E4"),
Descr4 = c(NA, "EEE2", "EEE3", "EEE4")
)
for(i in seq(from = 3, to = ncol(df), by = 2)){
# if the Descr column is not necessarily NA when its complementary Node
# column is NA, then you'll need to split this into two if-statements
if(any(is.na(df[,i]))){
df[,i][which(is.na(df[,i]))] <- df[,i-2][which(is.na(df[,i]))]
df[,i+1][which(is.na(df[,i+1]))] <- df[,i-1][which(is.na(df[,i+1]))]
}
}
df
Node Descr Node1 Descr1 Node2 Descr2 Node3 Descr3 Node4 Descr4
1 A1 AAA1 B1 BBB1 C1 CCC1 C1 CCC1 C1 CCC1
2 A2 AAA2 B2 BBB2 C2 CCC2 D2 DDD2 E2 EEE2
3 A3 AAA3 A3 AAA3 C3 CCC3 D3 DDD3 E3 EEE3
4 A4 AAA4 B4 BBB4 C4 CCC4 D4 DDD4 E4 EEE4
如果你有很多行,这应该会快得多。