改变 r 中某些行的前导或滞后列
Mutate a lead or lag column for certain rows in r
我的预测与移动假期不太一致。我正在尝试寻找快速修复方法:
这是我的数据框的结构:
df1:
Date City Visitors WKN WKN_2015 Holiday
2016-11-06 New York 40000 45 46 No_Holiday
2016-11-13 New York 50000 46 47 No_Holiday
2016-11-20 New York 50000 47 48 Thanksgiving
2016-11-27 New York 100000 48 49 Cyber_Monday
2016-12-04 New York 100000 49 50 No_Holiday
2016-12-11 New York 70000 50 51 No_Holiday
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving
一般来说,感恩节和网络星期一会有更多游客到访这座城市。但我的预测并未反映这一点。现在我想用这样的东西快速修复:
df1:
Date City Visitors WKN WKN_2015 Holiday New_Visitors
2016-11-06 New York 40000 45 46 No_Holiday 40000
2016-11-13 New York 50000 46 47 No_Holiday 50000
2016-11-20 New York 50000 47 48 Thanksgiving 100000
2016-11-27 New York 100000 48 49 Cyber_Monday 100000
2016-12-04 New York 100000 49 50 No_Holiday 70000
2016-12-11 New York 70000 50 51 No_Holiday 70000
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving 120000
如果你看到上面的数据 新卷只在感恩节、网络星期一和网络星期一后一周发生了变化。
有什么方法可以自动执行此操作,因为 2017 年的数据仍在继续,依此类推。
我一直在考虑快速解决方案,直到我制定出适合移动假期的预测。谁能指出我正确的方向?
我试过类似的方法,但这不起作用,因为我只需要 lag/lead 那 3 周:
df1 <-
df1 %>%
mutate(New_Visitors = ifelse(Holiday == "Thanksgiving", lag(Visitors, (WKN - WKN_2015), Visitors)
逻辑:每年查找感恩节,看看 WKN 是否匹配。如果不这样做,则根据 WKN 之间的差异调整从感恩节开始的接下来 3 周的访客。如果 WKN-WKN_2015 == -1 则在接下来的 3 行中将访问者领先 1,如果 WKN-WKN_2015 == 1 那么在接下来的 3 行中将访问者落后 1
数据
df1 <- structure(list(Date = c("2016-11-06", "2016-11-13", "2016-11-20",
"2016-11-27", "2016-12-04", "2016-12-11", "2017-11-23"), City = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "New York", class = "factor"),
Visitors = c(40000L, 50000L, 50000L, 100000L, 100000L, 70000L,
120000L), WKN = c(45L, 46L, 47L, 48L, 49L, 50L, 47L), WKN_2015 = c(46L,
47L, 48L, 49L, 50L, 51L, 47L), Holiday = structure(c(2L,
2L, 3L, 1L, 2L, 2L, 3L), .Label = c("Cyber_Monday", "No_Holiday",
"Thanksgiving"), class = "factor")), .Names = c("Date", "City",
"Visitors", "WKN", "WKN_2015", "Holiday"), row.names = c(NA,
7L), class = "data.frame")
您每年只对三周感兴趣,您可以在 "Thanksgiving" 行中计算滞后值。我认为不需要 dplyr
。
df1$New_Visitors <- df1$Visitors # copy Visitors
ind <- which(df1$Holiday == "Thanksgiving") # get number of "Thanksgiving" rows
invisible(sapply(ind, function(x) {
lag <- df1[x, "WKN_2015"] - df1[x, "WKN"] # calculate the lag
df1[x:(x+2), "New_Visitors"] <<- df1[(x+lag):(x+lag+2), "Visitors"] # rewrite
}))
> df1 # this method treats the three weeks as a unit, so made two NA rows in the example data)
Date City Visitors WKN WKN_2015 Holiday New_Visitors
1 2016-11-06 New York 40000 45 46 No_Holiday 40000
2 2016-11-13 New York 50000 46 47 No_Holiday 50000
3 2016-11-20 New York 50000 47 48 Thanksgiving 100000
4 2016-11-27 New York 100000 48 49 Cyber_Monday 100000
5 2016-12-04 New York 100000 49 50 No_Holiday 70000
6 2016-12-11 New York 70000 50 51 No_Holiday 70000
7 2017-11-23 New York 120000 47 47 Thanksgiving 120000
8 <NA> <NA> NA NA NA <NA> NA
9 <NA> <NA> NA NA NA <NA> NA
我的预测与移动假期不太一致。我正在尝试寻找快速修复方法:
这是我的数据框的结构:
df1:
Date City Visitors WKN WKN_2015 Holiday
2016-11-06 New York 40000 45 46 No_Holiday
2016-11-13 New York 50000 46 47 No_Holiday
2016-11-20 New York 50000 47 48 Thanksgiving
2016-11-27 New York 100000 48 49 Cyber_Monday
2016-12-04 New York 100000 49 50 No_Holiday
2016-12-11 New York 70000 50 51 No_Holiday
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving
一般来说,感恩节和网络星期一会有更多游客到访这座城市。但我的预测并未反映这一点。现在我想用这样的东西快速修复:
df1:
Date City Visitors WKN WKN_2015 Holiday New_Visitors
2016-11-06 New York 40000 45 46 No_Holiday 40000
2016-11-13 New York 50000 46 47 No_Holiday 50000
2016-11-20 New York 50000 47 48 Thanksgiving 100000
2016-11-27 New York 100000 48 49 Cyber_Monday 100000
2016-12-04 New York 100000 49 50 No_Holiday 70000
2016-12-11 New York 70000 50 51 No_Holiday 70000
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving 120000
如果你看到上面的数据 新卷只在感恩节、网络星期一和网络星期一后一周发生了变化。 有什么方法可以自动执行此操作,因为 2017 年的数据仍在继续,依此类推。
我一直在考虑快速解决方案,直到我制定出适合移动假期的预测。谁能指出我正确的方向?
我试过类似的方法,但这不起作用,因为我只需要 lag/lead 那 3 周:
df1 <-
df1 %>%
mutate(New_Visitors = ifelse(Holiday == "Thanksgiving", lag(Visitors, (WKN - WKN_2015), Visitors)
逻辑:每年查找感恩节,看看 WKN 是否匹配。如果不这样做,则根据 WKN 之间的差异调整从感恩节开始的接下来 3 周的访客。如果 WKN-WKN_2015 == -1 则在接下来的 3 行中将访问者领先 1,如果 WKN-WKN_2015 == 1 那么在接下来的 3 行中将访问者落后 1
数据df1 <- structure(list(Date = c("2016-11-06", "2016-11-13", "2016-11-20",
"2016-11-27", "2016-12-04", "2016-12-11", "2017-11-23"), City = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "New York", class = "factor"),
Visitors = c(40000L, 50000L, 50000L, 100000L, 100000L, 70000L,
120000L), WKN = c(45L, 46L, 47L, 48L, 49L, 50L, 47L), WKN_2015 = c(46L,
47L, 48L, 49L, 50L, 51L, 47L), Holiday = structure(c(2L,
2L, 3L, 1L, 2L, 2L, 3L), .Label = c("Cyber_Monday", "No_Holiday",
"Thanksgiving"), class = "factor")), .Names = c("Date", "City",
"Visitors", "WKN", "WKN_2015", "Holiday"), row.names = c(NA,
7L), class = "data.frame")
您每年只对三周感兴趣,您可以在 "Thanksgiving" 行中计算滞后值。我认为不需要 dplyr
。
df1$New_Visitors <- df1$Visitors # copy Visitors
ind <- which(df1$Holiday == "Thanksgiving") # get number of "Thanksgiving" rows
invisible(sapply(ind, function(x) {
lag <- df1[x, "WKN_2015"] - df1[x, "WKN"] # calculate the lag
df1[x:(x+2), "New_Visitors"] <<- df1[(x+lag):(x+lag+2), "Visitors"] # rewrite
}))
> df1 # this method treats the three weeks as a unit, so made two NA rows in the example data)
Date City Visitors WKN WKN_2015 Holiday New_Visitors
1 2016-11-06 New York 40000 45 46 No_Holiday 40000
2 2016-11-13 New York 50000 46 47 No_Holiday 50000
3 2016-11-20 New York 50000 47 48 Thanksgiving 100000
4 2016-11-27 New York 100000 48 49 Cyber_Monday 100000
5 2016-12-04 New York 100000 49 50 No_Holiday 70000
6 2016-12-11 New York 70000 50 51 No_Holiday 70000
7 2017-11-23 New York 120000 47 47 Thanksgiving 120000
8 <NA> <NA> NA NA NA <NA> NA
9 <NA> <NA> NA NA NA <NA> NA