添加具有算术条件的新变量

Add new variable with arithmetic conditions

随机生成的数据框包含 ID、日期和收入。我更改了数据框格式,使每一列代表一个日期,其值对应于收入。

我想创建一个名为“Date_over100”的新变量,用于确定一个人的累计收入超过 100 的日期。我在下方放置了一个可生成数据框的可重现代码。我假设将使用条件语句或循环来实现这一点。我将不胜感激所有的帮助。提前致谢!

ID <- c(1:10)
Date <- sample(seq(as.Date('2021/01/01'), as.Date('2021/01/11'), by="day", replace=T), 10)
Earning <- round(runif(10,30,50),digits = 2)
df <- data.frame(ID,Date,Earning,check.names = F)

df1 <- df%>%
  arrange(Date)%>%
  pivot_wider(names_from = Date, values_from = Earning)

df1 <- as.data.frame(df1)
df1[is.na(df1)] <- round(runif(sum(is.na(df1)),min=30,max=50),digits = 2)

我回到长格式计算,然后加入宽数据:

library(dplyr)
library(tidyr)

df1 %>% pivot_longer(cols = -ID, names_to = "date") %>%
  group_by(ID) %>%
  summarize(Date_over_100 = Date[which.max(cumsum(value) > 100)]) %>%
  right_join(df1, by = "ID")
# # A tibble: 10 × 12
#       ID Date_over_100 `2021-01-04` `2021-01-01` `2021-01-08` `2021-01-11` `2021-01-02` `2021-01-09`
#    <int> <date>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#  1     1 2021-01-08            45.0         46.2         40.1         47.4         47.5         48.8
#  2     2 2021-01-08            36.7         30.3         36.2         47.5         41.4         41.7
#  3     3 2021-01-08            49.5         46.0         45.0         43.9         45.4         37.1
#  4     4 2021-01-08            31.0         48.7         47.3         40.4         40.8         35.5
#  5     5 2021-01-08            48.2         35.2         32.1         44.2         35.4         49.7
#  6     6 2021-01-08            40.8         37.6         31.8         40.3         38.3         42.5
#  7     7 2021-01-08            37.9         42.9         36.8         46.0         39.8         33.6
#  8     8 2021-01-08            47.7         47.8         39.7         46.4         43.8         46.5
#  9     9 2021-01-08            32.9         42.0         41.8         32.8         33.9         35.5
# 10    10 2021-01-08            34.5         40.1         42.7         35.9         44.8         31.8
# # … with 4 more variables: 2021-01-10 <dbl>, 2021-01-03 <dbl>, 2021-01-07 <dbl>, 2021-01-05 <dbl>