使用两个 if 语句加速 for 循环
Speed up for loop with two if statements
我有一个包含 15,000 多行的数据 table DT
。我有一个正确运行的 for
循环,但它需要 30 多秒,并且是整个代码中最慢的部分。这是 for
循环:
for (i in 2:nrow(DT)) {
if(DT$C1[i] == DT$C1[i+1] & DT$C2[i] != DT$C2[i+1] & DT$C3[i+1] - DT$C3[i] <= 4 & DT$C2[i] == "Short" & DT$C2[i+1] != "Long") DT$C4[i] = 1 else
if(DT$C1[i] == DT$C1[i-1] & DT$C2[i] != DT$C2[i-1] & DT$C3[i] - DT$C3[i-1] <= 4 & DT$C2[i] == "Short" & DT$C2[i-1] != "Long") DT$C4[i] = 1 else
0 }
有什么办法可以加快速度吗?我在 here 和其他地方看到了一些例子,但它们并没有完全解决我的 i+1
和 i-1
.
的特定问题
这是一些示例数据。
C1 <- c("1","1","1","1","1","2","2","2","3","3","3","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4")
C2 <- c("Short","Short","Short","None","None","Short","Short","Other","Long","Long","Long","Long","Long","Long","Long","Long","Short","Short","Other","Short","Short","None","Short","None","Other","Short","Short","Short","Short")
C3 <- c(as.Date("2010-06-01"),as.Date("2010-06-05"),as.Date("2010-06-09"),as.Date("2010-06-13"),as.Date("2010-06-17"),as.Date("2010-06-02"),as.Date("2010-06-21"),as.Date("2010-07-09"),as.Date("2010-07-13"),as.Date("2010-07-17"),as.Date("2010-07-21"),as.Date("2010-08-01"),as.Date("2010-08-05"),as.Date("2010-08-09"),as.Date("2010-09-03"),as.Date("2010-09-07"),as.Date("2010-06-03"),as.Date("2010-06-07"),as.Date("2010-06-11"),as.Date("2010-06-14"),as.Date("2010-06-17"),as.Date("2010-06-21"),as.Date("2010-06-24"),as.Date("2010-06-27"),as.Date("2010-07-01"),as.Date("2010-07-05"),as.Date("2010-07-09"),as.Date("2010-07-13"),as.Date("2010-07-17"))
DF <- data.frame(C1=C1, C2=C2, C3=C3)
DT <- as.data.table(DF)
以及所需的输出。
C1 C2 C3 C4
1 Short 2010-06-01 0
1 Short 2010-06-05 0
1 Short 2010-06-09 1
1 None 2010-06-13 0
1 None 2010-06-17 0
2 Short 2010-06-02 0
2 Short 2010-06-21 0
2 Other 2010-07-09 0
3 Long 2010-07-13 0
3 Long 2010-07-17 0
3 Long 2010-07-21 0
3 Long 2010-08-01 0
3 Long 2010-08-05 0
3 Long 2010-08-09 0
3 Long 2010-09-03 0
3 Long 2010-09-07 0
4 Short 2010-06-03 0
4 Short 2010-06-07 1
4 Other 2010-06-11 0
4 Short 2010-06-14 1
4 Short 2010-06-17 1
4 None 2010-06-21 0
4 Short 2010-06-24 1
4 None 2010-06-27 0
4 Other 2010-07-01 0
4 Short 2010-07-05 1
4 Short 2010-07-09 0
4 Short 2010-07-13 0
4 Short 2010-07-17 0
感谢您的帮助。
您可以使用类似以下内容对其进行矢量化:
n <- nrow(DT)
DT$C4 <- NA # Initialize however you want
# Warning -- untested due to no reproducible example...
DT$C4[2:(n-1)] <- as.numeric((DT$C1[2:(n-1)] == DT$C1[3:n] & DT$C2[2:(n-1)] != DT$C2[3:n] & DT$C3[3:n] - DT$C3[2:(n-1)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[3:n] != "Long") |
(DT$C1[2:(n-1)] == DT$C1[1:(n-2)] & DT$C2[2:(n-1)] != DT$C2[1:(n-2)] & DT$C3[2:(n-1)] - DT$C3[1:(n-2)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[1:(n-2)] != "Long"))
基本上每次你在循环中用 i
索引时你用 2:(n-1)
替换它,每次你用 i-1
索引时你用 1:(n-2)
替换它,并且每次用 i+1
索引时,都会用 3:n
.
替换它
我有一个包含 15,000 多行的数据 table DT
。我有一个正确运行的 for
循环,但它需要 30 多秒,并且是整个代码中最慢的部分。这是 for
循环:
for (i in 2:nrow(DT)) {
if(DT$C1[i] == DT$C1[i+1] & DT$C2[i] != DT$C2[i+1] & DT$C3[i+1] - DT$C3[i] <= 4 & DT$C2[i] == "Short" & DT$C2[i+1] != "Long") DT$C4[i] = 1 else
if(DT$C1[i] == DT$C1[i-1] & DT$C2[i] != DT$C2[i-1] & DT$C3[i] - DT$C3[i-1] <= 4 & DT$C2[i] == "Short" & DT$C2[i-1] != "Long") DT$C4[i] = 1 else
0 }
有什么办法可以加快速度吗?我在 here 和其他地方看到了一些例子,但它们并没有完全解决我的 i+1
和 i-1
.
这是一些示例数据。
C1 <- c("1","1","1","1","1","2","2","2","3","3","3","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4")
C2 <- c("Short","Short","Short","None","None","Short","Short","Other","Long","Long","Long","Long","Long","Long","Long","Long","Short","Short","Other","Short","Short","None","Short","None","Other","Short","Short","Short","Short")
C3 <- c(as.Date("2010-06-01"),as.Date("2010-06-05"),as.Date("2010-06-09"),as.Date("2010-06-13"),as.Date("2010-06-17"),as.Date("2010-06-02"),as.Date("2010-06-21"),as.Date("2010-07-09"),as.Date("2010-07-13"),as.Date("2010-07-17"),as.Date("2010-07-21"),as.Date("2010-08-01"),as.Date("2010-08-05"),as.Date("2010-08-09"),as.Date("2010-09-03"),as.Date("2010-09-07"),as.Date("2010-06-03"),as.Date("2010-06-07"),as.Date("2010-06-11"),as.Date("2010-06-14"),as.Date("2010-06-17"),as.Date("2010-06-21"),as.Date("2010-06-24"),as.Date("2010-06-27"),as.Date("2010-07-01"),as.Date("2010-07-05"),as.Date("2010-07-09"),as.Date("2010-07-13"),as.Date("2010-07-17"))
DF <- data.frame(C1=C1, C2=C2, C3=C3)
DT <- as.data.table(DF)
以及所需的输出。
C1 C2 C3 C4
1 Short 2010-06-01 0
1 Short 2010-06-05 0
1 Short 2010-06-09 1
1 None 2010-06-13 0
1 None 2010-06-17 0
2 Short 2010-06-02 0
2 Short 2010-06-21 0
2 Other 2010-07-09 0
3 Long 2010-07-13 0
3 Long 2010-07-17 0
3 Long 2010-07-21 0
3 Long 2010-08-01 0
3 Long 2010-08-05 0
3 Long 2010-08-09 0
3 Long 2010-09-03 0
3 Long 2010-09-07 0
4 Short 2010-06-03 0
4 Short 2010-06-07 1
4 Other 2010-06-11 0
4 Short 2010-06-14 1
4 Short 2010-06-17 1
4 None 2010-06-21 0
4 Short 2010-06-24 1
4 None 2010-06-27 0
4 Other 2010-07-01 0
4 Short 2010-07-05 1
4 Short 2010-07-09 0
4 Short 2010-07-13 0
4 Short 2010-07-17 0
感谢您的帮助。
您可以使用类似以下内容对其进行矢量化:
n <- nrow(DT)
DT$C4 <- NA # Initialize however you want
# Warning -- untested due to no reproducible example...
DT$C4[2:(n-1)] <- as.numeric((DT$C1[2:(n-1)] == DT$C1[3:n] & DT$C2[2:(n-1)] != DT$C2[3:n] & DT$C3[3:n] - DT$C3[2:(n-1)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[3:n] != "Long") |
(DT$C1[2:(n-1)] == DT$C1[1:(n-2)] & DT$C2[2:(n-1)] != DT$C2[1:(n-2)] & DT$C3[2:(n-1)] - DT$C3[1:(n-2)] <= 4 & DT$C2[2:(n-1)] == "Short" & DT$C2[1:(n-2)] != "Long"))
基本上每次你在循环中用 i
索引时你用 2:(n-1)
替换它,每次你用 i-1
索引时你用 1:(n-2)
替换它,并且每次用 i+1
索引时,都会用 3:n
.