map/dplyr 在数据框中动态填充两列的方法
map/dplyr way for dynamically populating two columns in dataframe
我有数据框 'test',如下面最底部所示。
我有 2 个不同的操作,我想在两个不同的列上完成,如果可能的话,我想使用高效的 dplyr 或 purrr 方法来解决。
操作#1:
我想将 'amt_needed' NA 值填充为上面 'remaining' 中的两个值(这是一个测试数据框,但在实际版本中我会有更多行,每次我都喜欢这两个 'amt_needed' 值 = 为以上两行中 'remaining' 中的两个值。
操作#2:
'remaining' 的两个 NA 值应该是新的 'amt_needed' 值 - a 和 b 的 sum(contrib)。
任何 thoughts/suggestions 感谢!
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15"),
name = c("a","b","a","b"),
contrib = c(4,2,4,2),
amt_needed = c(100,100, NA,NA),
remaining = c(94,94, NA,NA))
根据 OP 中提供的新数据,使用 dplyr 的一种解决方案可能是:
library(dplyr)
# Data
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15", "2018-01-30", "2018-01-30"),
name = c("a","b","a","b", "a","b"),
contrib = c(4,2,4,2,4,2),
amt_needed = c(100,100, NA,NA, NA,NA),
remaining = c(94,94, NA,NA, NA,NA))
# Change column to date
test$date <- as.Date(test$date, "%Y-%m-%d")
test$amt_needed <- test$amt_needed[1]
test %>%
arrange(date, name) %>%
group_by(date) %>%
mutate(group_contrib = cumsum(sum(contrib))) %>%
ungroup() %>%
select(date, group_contrib) %>%
unique() %>%
arrange(date) %>%
mutate(cumm_group_sum = cumsum(group_contrib)) %>%
inner_join(test, by = "date") %>%
mutate(remaining = amt_needed - cumm_group_sum) %>%
mutate(amt_needed_act = remaining + group_contrib) %>%
select(date, name, contrib, amt_needed_act, remaining)
# A tibble: 6 x 5
date name contrib amt_needed_act remaining
<date> <fctr> <dbl> <dbl> <dbl>
1 2018-01-01 a 4.00 100 94.0
2 2018-01-01 b 2.00 100 94.0
3 2018-01-15 a 4.00 94.0 88.0
4 2018-01-15 b 2.00 94.0 88.0
5 2018-01-30 a 4.00 88.0 82.0
6 2018-01-30 b 2.00 88.0 82.0
我有数据框 'test',如下面最底部所示。
我有 2 个不同的操作,我想在两个不同的列上完成,如果可能的话,我想使用高效的 dplyr 或 purrr 方法来解决。
操作#1: 我想将 'amt_needed' NA 值填充为上面 'remaining' 中的两个值(这是一个测试数据框,但在实际版本中我会有更多行,每次我都喜欢这两个 'amt_needed' 值 = 为以上两行中 'remaining' 中的两个值。
操作#2: 'remaining' 的两个 NA 值应该是新的 'amt_needed' 值 - a 和 b 的 sum(contrib)。
任何 thoughts/suggestions 感谢!
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15"),
name = c("a","b","a","b"),
contrib = c(4,2,4,2),
amt_needed = c(100,100, NA,NA),
remaining = c(94,94, NA,NA))
根据 OP 中提供的新数据,使用 dplyr 的一种解决方案可能是:
library(dplyr)
# Data
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15", "2018-01-30", "2018-01-30"),
name = c("a","b","a","b", "a","b"),
contrib = c(4,2,4,2,4,2),
amt_needed = c(100,100, NA,NA, NA,NA),
remaining = c(94,94, NA,NA, NA,NA))
# Change column to date
test$date <- as.Date(test$date, "%Y-%m-%d")
test$amt_needed <- test$amt_needed[1]
test %>%
arrange(date, name) %>%
group_by(date) %>%
mutate(group_contrib = cumsum(sum(contrib))) %>%
ungroup() %>%
select(date, group_contrib) %>%
unique() %>%
arrange(date) %>%
mutate(cumm_group_sum = cumsum(group_contrib)) %>%
inner_join(test, by = "date") %>%
mutate(remaining = amt_needed - cumm_group_sum) %>%
mutate(amt_needed_act = remaining + group_contrib) %>%
select(date, name, contrib, amt_needed_act, remaining)
# A tibble: 6 x 5
date name contrib amt_needed_act remaining
<date> <fctr> <dbl> <dbl> <dbl>
1 2018-01-01 a 4.00 100 94.0
2 2018-01-01 b 2.00 100 94.0
3 2018-01-15 a 4.00 94.0 88.0
4 2018-01-15 b 2.00 94.0 88.0
5 2018-01-30 a 4.00 88.0 82.0
6 2018-01-30 b 2.00 88.0 82.0