为什么 map_dbl() 的滚动总和在一种情况下有效,而在另一种情况下无效?
Why does rolling sum with map_dbl() work in one case but not the other?
我一直在努力理解为什么 map_dbl()
适用于一种情况,但不适用于另一种(据我所知)几乎相同的情况。这是它起作用的地方:
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), aux = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(169L, 201L, 202L, 200L, 348L,
347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
library(dplyr)
library(lubridate)
library(purrr)
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(aux) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(aux[
facilitystartdate > (. - months(36)) & facilitystartdate <= .]))) %>% ungroup()
facilities <- facilities %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals"
这些是大约 24 000 的前 30 行。
接下来我尝试做同样的事情,但是要求和的列用于更大的数字并且由于某种原因它不起作用。我没看到什么?
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), facility_usd = c(122,
690, 800, 225, 58, 360, 70, 124.5, 200, 44.899425, 115.455663, 120, 173.4, 50, 32, 68.7, 465, 26.638, 48, 8.245, 50, 143.9, 525, 90, 50, 40, 30, 345, 10, 7.5)), row.names = c(169L, 201L,
202L, 200L, 348L, 347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(facility_usd) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(facility_usd[
facilitystartdate > (. - months(36)) & facilitystartdate <= .])))
facilities <- facilities %>%
ungroup() %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals_USD"
代码现在工作的地方是我最初尝试做的,Ronak Shah's answer 这里帮助了我。但我不明白为什么同样的解决方案在第二种情况下不起作用。有人可以帮我吗?
更新:
这是第一个工作的样子。 All_3Y_deals 是前 3 年 aux
值的总和。
facilityid facilitystartdate aux All_3Y_deals
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 1 1
2 12937 1982-06-29 1 2
3 12960 1984-04-06 1 3
4 12927 1984-12-10 1 3
5 14201 1986-01-01 1 3
6 14200 1986-05-22 1 4
7 13539 1987-01-01 1 7
8 13543 1987-01-01 1 7
9 13548 1987-01-01 1 7
10 115930 1987-02-01 1 9
11 115931 1987-02-01 1 9
12 13542 1987-03-01 1 10
13 13547 1987-04-24 1 10
14 1243 1987-05-28 1 11
15 13535 1987-06-23 1 12
16 13538 1987-07-01 1 14
17 13553 1987-07-01 1 14
18 13534 1987-08-01 1 15
19 13537 1987-09-01 1 17
20 14047 1987-09-01 1 17
21 897 1987-10-01 1 18
22 13544 1987-10-21 1 19
23 2410 1987-10-22 1 20
24 62175 1987-11-01 1 21
25 13540 1987-11-03 1 23
26 13541 1987-11-03 1 23
27 898 1987-12-01 1 29
28 1059 1987-12-01 1 29
29 13530 1987-12-01 1 29
30 13531 1987-12-01 1 29
更新: 这是第二个的预期结果:
facilityid facilitystartdate facility_usd All_3Y_deals_USD
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 122 122
2 12937 1982-06-29 690 812
3 12960 1984-04-06 800 1612
4 12927 1984-12-10 225 1715
5 14201 1986-01-01 58 1083
6 14200 1986-05-22 360 2255
7 13539 1987-01-01 70 2325
8 13543 1987-01-01 124. 2449
9 13548 1987-01-01 200 2649
10 115930 1987-02-01 5.76 2654.76
11 115931 1987-02-01 14.8 2669.56
12 13542 1987-03-01 120 2789.56
13 13547 1987-04-24 173. 2162.56
14 1243 1987-05-28 50 3012.56
15 13535 1987-06-23 32 3044.56
16 13538 1987-07-01 68.7 3113.26
17 13553 1987-07-01 465 3578.26
18 13534 1987-08-01 26.6 3604.86
19 13537 1987-09-01 48 3652.86
20 14047 1987-09-01 8.24 3661.1
21 897 1987-10-01 50 3761.1
22 13544 1987-10-21 144. 3905.1
23 2410 1987-10-22 525 4430.1
24 62175 1987-11-01 90 4520.1
25 13540 1987-11-03 50 4670.1
26 13541 1987-11-03 40 4710.1
27 898 1987-12-01 30 4740.1
28 1059 1987-12-01 345 5085.1
29 13530 1987-12-01 10 5095.1
30 13531 1987-12-01 7.5 5102.6
我正在尝试为 All_3Y_deals_USD 列
获取 facility_usd 的 3 年滚动总和
在第一种情况下,您按 aux
分组,结果只有一组包含 30 个观测值。
在第二种情况下,您按 facility_usd
分组并得到 28 个组,因为大多数值都是唯一的。
特别是如果您查看第二种情况的第一行,您会得到 facility_usd = 122
。只有一个观察结果是这样的:因此你得到 `cum1 = 122' 因为所有其他行都属于不同的组。
所以一切正常。
我一直在努力理解为什么 map_dbl()
适用于一种情况,但不适用于另一种(据我所知)几乎相同的情况。这是它起作用的地方:
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), aux = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(169L, 201L, 202L, 200L, 348L,
347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
library(dplyr)
library(lubridate)
library(purrr)
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(aux) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(aux[
facilitystartdate > (. - months(36)) & facilitystartdate <= .]))) %>% ungroup()
facilities <- facilities %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals"
这些是大约 24 000 的前 30 行。
接下来我尝试做同样的事情,但是要求和的列用于更大的数字并且由于某种原因它不起作用。我没看到什么?
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), facility_usd = c(122,
690, 800, 225, 58, 360, 70, 124.5, 200, 44.899425, 115.455663, 120, 173.4, 50, 32, 68.7, 465, 26.638, 48, 8.245, 50, 143.9, 525, 90, 50, 40, 30, 345, 10, 7.5)), row.names = c(169L, 201L,
202L, 200L, 348L, 347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(facility_usd) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(facility_usd[
facilitystartdate > (. - months(36)) & facilitystartdate <= .])))
facilities <- facilities %>%
ungroup() %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals_USD"
代码现在工作的地方是我最初尝试做的,Ronak Shah's answer 这里帮助了我。但我不明白为什么同样的解决方案在第二种情况下不起作用。有人可以帮我吗?
更新:
这是第一个工作的样子。 All_3Y_deals 是前 3 年 aux
值的总和。
facilityid facilitystartdate aux All_3Y_deals
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 1 1
2 12937 1982-06-29 1 2
3 12960 1984-04-06 1 3
4 12927 1984-12-10 1 3
5 14201 1986-01-01 1 3
6 14200 1986-05-22 1 4
7 13539 1987-01-01 1 7
8 13543 1987-01-01 1 7
9 13548 1987-01-01 1 7
10 115930 1987-02-01 1 9
11 115931 1987-02-01 1 9
12 13542 1987-03-01 1 10
13 13547 1987-04-24 1 10
14 1243 1987-05-28 1 11
15 13535 1987-06-23 1 12
16 13538 1987-07-01 1 14
17 13553 1987-07-01 1 14
18 13534 1987-08-01 1 15
19 13537 1987-09-01 1 17
20 14047 1987-09-01 1 17
21 897 1987-10-01 1 18
22 13544 1987-10-21 1 19
23 2410 1987-10-22 1 20
24 62175 1987-11-01 1 21
25 13540 1987-11-03 1 23
26 13541 1987-11-03 1 23
27 898 1987-12-01 1 29
28 1059 1987-12-01 1 29
29 13530 1987-12-01 1 29
30 13531 1987-12-01 1 29
更新: 这是第二个的预期结果:
facilityid facilitystartdate facility_usd All_3Y_deals_USD
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 122 122
2 12937 1982-06-29 690 812
3 12960 1984-04-06 800 1612
4 12927 1984-12-10 225 1715
5 14201 1986-01-01 58 1083
6 14200 1986-05-22 360 2255
7 13539 1987-01-01 70 2325
8 13543 1987-01-01 124. 2449
9 13548 1987-01-01 200 2649
10 115930 1987-02-01 5.76 2654.76
11 115931 1987-02-01 14.8 2669.56
12 13542 1987-03-01 120 2789.56
13 13547 1987-04-24 173. 2162.56
14 1243 1987-05-28 50 3012.56
15 13535 1987-06-23 32 3044.56
16 13538 1987-07-01 68.7 3113.26
17 13553 1987-07-01 465 3578.26
18 13534 1987-08-01 26.6 3604.86
19 13537 1987-09-01 48 3652.86
20 14047 1987-09-01 8.24 3661.1
21 897 1987-10-01 50 3761.1
22 13544 1987-10-21 144. 3905.1
23 2410 1987-10-22 525 4430.1
24 62175 1987-11-01 90 4520.1
25 13540 1987-11-03 50 4670.1
26 13541 1987-11-03 40 4710.1
27 898 1987-12-01 30 4740.1
28 1059 1987-12-01 345 5085.1
29 13530 1987-12-01 10 5095.1
30 13531 1987-12-01 7.5 5102.6
我正在尝试为 All_3Y_deals_USD 列
获取 facility_usd 的 3 年滚动总和在第一种情况下,您按 aux
分组,结果只有一组包含 30 个观测值。
在第二种情况下,您按 facility_usd
分组并得到 28 个组,因为大多数值都是唯一的。
特别是如果您查看第二种情况的第一行,您会得到 facility_usd = 122
。只有一个观察结果是这样的:因此你得到 `cum1 = 122' 因为所有其他行都属于不同的组。
所以一切正常。