如何通过R中的算术条件为组生成随机值

Question

我有这样结构的数据集

mydata=structure(list(supps = c("KR", "KR", "KR", "KR", "KR", "KR", 
"KR", "KR", "KR", "KR", "aeroclub", "aeroclub", "aeroclub", "aeroclub", 
"aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub"
), date = c("01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021", 
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021", 
"05.06.2021", "01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021", 
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021", 
"05.06.2021"), turnover = c(0, 0, 32159.00888, 25220.0027, 0, 
0, 245312.682, 189901.1224, 0, 0, 1531959.833, 1591612, 1834696.667, 
1885169, 1871615.167, 1823398, 4891342, 5253701.167, 0, 0), fee = c(0, 
0, 651, 37, 0, 0, 2341, 7548, 0, 0, 40519.5, 30415, 34767.66667, 
39289, 39175.66667, 45798, 94819.5, 116803.1667, 0, 0), comiss = c(0, 
0, 764.81, 537.67, 0, 0, 8578.25, 6198.115, 0, 0, -2023.41, -1941.67, 
-550.82, 1323.23, -1029.47, -638.47, -1034.58, -1332.95, 0, 0
), intencive = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 26.4, 1945.8, 
2199.48, 3740.76, 6499.2, 32188.68, 42337.44, 0, 0)), class = "data.frame", row.names = c(NA, 
-20L))

我需要通过支持列（KR 和 aeroclub）为每个组提供变量 turnover fee comiss intencive 通过下一个条件计算值。例如，我们采用 KR 和营业额变量。最后 2 个值属于日期 03.06.2021-04.06.2021。如果最近的值大于前一个值，则计算值之和189901+0=189901。然后为日期 05.06.2021-08.06.2021（4 天）的每个变量生成随机值。这从中以随机顺序计算出 Sum 189901+(2%-10%)。更清楚例如输出（营业额变量）

05.06.2021  189901+2%=193699,02
06.06.2021 189901+10%=208891,1
07.06.2021  189901+6%=208891,1
08.06.2021 189901+7%=203194

但有时可能是最后一个值为负数。例如。组=航空俱乐部。变量 comiss，最后 2 个值 03.06.2021-04.06.2021 在 04.06.2021 的值 -1332，但在 03.06.2021 的值 -632，因此在 04.06.2021 的值小于 03.06.2021 的值。我们对这些值求和 -1332+-632=-1954 但后来我们没有加求和，我们以随机顺序对 -1954-(2%-10%) 进行子追踪。所以对于这个组由 comiss 期望的输出

05.06.2021  -1954-2%=-1914,92
06.06.2021 -1954-7%=-1817,22
07.06.2021  -1954-6%=-1836,76
08.06.2021 -1954-8%=-1797,68

我怎样才能正确？

Answer 1

以下答案假设了一些问题中不完全清楚的事情：

计算从第 3 列开始到最后一列
有0时，保持为0，不添加随机%。尽管您可以根据需要更改它。
很少有两个连续值具有不同符号的情况。对于这些情况，在问题之后应用最新值的规则。

#storing the unique category of supps
col_supps <- unique(mydata$supps)
#storing the columns for which the calculations will be done
col_names <- colnames(mydata)[3:ncol(mydata)]

#the data frame which will contain the output
output_df <- data.frame()
#Iterating over different supps values

for (x in col_supps) {
#storing one type of supps in a temporary data frame
  mydata %>%
    filter(supps %in% x)-> temp
  temp1<- temp

#temp will act as a reference frame, in temp1 values will be updated
  
#Now, iterating over columns which we need
  
  for (y in col_names) {
    i<- 1
#with while loop, we will iterate over each elememnt of the column and save the result in temp1
    while (i<=(nrow(temp)-1)) {
      if(temp[i+1,y]>0 & temp[i+1,y]>=temp[i,y]){
        temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
      }else if(temp[i+1,y]<0 & temp[i+1,y]<=temp[i,y]){
        temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
      }else if(temp[i+1,y]>0 & temp[i+1,y]<temp[i,y]){
        temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
      }else if(temp[i+1,y]<0 & temp[i+1,y]>temp[i,y]){
        temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
      }else if(temp[i+1,y]==0){
        temp[i+1,y] <- 0
      }
      i <- i+1
    }
  }
#saving the output in the output data frame before repeating the process for another type of supps
  output_df %>%
    bind_rows(temp1) -> output_df
  
}

现在 output_df 将得到您想要的最终输出。如果你想要随机值的再现性，你可以在 while 循环下 set.seed() 。如果不需要，那么您可以按原样进行。

如何通过R中的算术条件为组生成随机值

how generate random values for groups by arithmetic condition in R

r

lapply

dplyr

tidyr