如何通过R中的算术条件为组生成随机值
how generate random values for groups by arithmetic condition in R
我有这样结构的数据集
mydata=structure(list(supps = c("KR", "KR", "KR", "KR", "KR", "KR",
"KR", "KR", "KR", "KR", "aeroclub", "aeroclub", "aeroclub", "aeroclub",
"aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub"
), date = c("01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021",
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021",
"05.06.2021", "01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021",
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021",
"05.06.2021"), turnover = c(0, 0, 32159.00888, 25220.0027, 0,
0, 245312.682, 189901.1224, 0, 0, 1531959.833, 1591612, 1834696.667,
1885169, 1871615.167, 1823398, 4891342, 5253701.167, 0, 0), fee = c(0,
0, 651, 37, 0, 0, 2341, 7548, 0, 0, 40519.5, 30415, 34767.66667,
39289, 39175.66667, 45798, 94819.5, 116803.1667, 0, 0), comiss = c(0,
0, 764.81, 537.67, 0, 0, 8578.25, 6198.115, 0, 0, -2023.41, -1941.67,
-550.82, 1323.23, -1029.47, -638.47, -1034.58, -1332.95, 0, 0
), intencive = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 26.4, 1945.8,
2199.48, 3740.76, 6499.2, 32188.68, 42337.44, 0, 0)), class = "data.frame", row.names = c(NA,
-20L))
我需要通过支持列(KR 和 aeroclub)为每个组提供变量 turnover fee comiss intencive
通过下一个条件计算值。
例如,我们采用 KR 和营业额变量。最后 2 个值属于日期 03.06.2021-04.06.2021
。如果最近的值大于前一个值,则计算值之和189901+0=189901。然后为日期 05.06.2021-08.06.2021(4 天)的每个变量生成随机值。这从中以随机顺序计算出 Sum 189901+(2%-10%)。
更清楚例如输出(营业额变量)
05.06.2021 189901+2%=193699,02
06.06.2021 189901+10%=208891,1
07.06.2021 189901+6%=208891,1
08.06.2021 189901+7%=203194
但有时可能是最后一个值为负数。例如。组=航空俱乐部。变量 comiss,最后 2 个值 03.06.2021-04.06.2021
在 04.06.2021 的值 -1332,但在 03.06.2021 的值 -632,因此在 04.06.2021 的值小于 03.06.2021 的值。我们对这些值求和 -1332+-632=-1954 但后来我们没有加求和,我们以随机顺序对 -1954-(2%-10%) 进行子追踪。
所以对于这个组由 comiss 期望的输出
05.06.2021 -1954-2%=-1914,92
06.06.2021 -1954-7%=-1817,22
07.06.2021 -1954-6%=-1836,76
08.06.2021 -1954-8%=-1797,68
我怎样才能正确?
以下答案假设了一些问题中不完全清楚的事情:
- 计算从第 3 列开始到最后一列
- 有0时,保持为0,不添加随机%。尽管您可以根据需要更改它。
- 很少有两个连续值具有不同符号的情况。对于这些情况,在问题之后应用最新值的规则。
#storing the unique category of supps
col_supps <- unique(mydata$supps)
#storing the columns for which the calculations will be done
col_names <- colnames(mydata)[3:ncol(mydata)]
#the data frame which will contain the output
output_df <- data.frame()
#Iterating over different supps values
for (x in col_supps) {
#storing one type of supps in a temporary data frame
mydata %>%
filter(supps %in% x)-> temp
temp1<- temp
#temp will act as a reference frame, in temp1 values will be updated
#Now, iterating over columns which we need
for (y in col_names) {
i<- 1
#with while loop, we will iterate over each elememnt of the column and save the result in temp1
while (i<=(nrow(temp)-1)) {
if(temp[i+1,y]>0 & temp[i+1,y]>=temp[i,y]){
temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]<0 & temp[i+1,y]<=temp[i,y]){
temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]>0 & temp[i+1,y]<temp[i,y]){
temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]<0 & temp[i+1,y]>temp[i,y]){
temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]==0){
temp[i+1,y] <- 0
}
i <- i+1
}
}
#saving the output in the output data frame before repeating the process for another type of supps
output_df %>%
bind_rows(temp1) -> output_df
}
现在 output_df
将得到您想要的最终输出。如果你想要随机值的再现性,你可以在 while
循环下 set.seed()
。如果不需要,那么您可以按原样进行。
我有这样结构的数据集
mydata=structure(list(supps = c("KR", "KR", "KR", "KR", "KR", "KR",
"KR", "KR", "KR", "KR", "aeroclub", "aeroclub", "aeroclub", "aeroclub",
"aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub", "aeroclub"
), date = c("01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021",
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021",
"05.06.2021", "01.05.2021", "01.06.2021", "02.05.2021", "02.06.2021",
"03.05.2021", "03.06.2021", "04.05.2021", "04.06.2021", "05.05.2021",
"05.06.2021"), turnover = c(0, 0, 32159.00888, 25220.0027, 0,
0, 245312.682, 189901.1224, 0, 0, 1531959.833, 1591612, 1834696.667,
1885169, 1871615.167, 1823398, 4891342, 5253701.167, 0, 0), fee = c(0,
0, 651, 37, 0, 0, 2341, 7548, 0, 0, 40519.5, 30415, 34767.66667,
39289, 39175.66667, 45798, 94819.5, 116803.1667, 0, 0), comiss = c(0,
0, 764.81, 537.67, 0, 0, 8578.25, 6198.115, 0, 0, -2023.41, -1941.67,
-550.82, 1323.23, -1029.47, -638.47, -1034.58, -1332.95, 0, 0
), intencive = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 26.4, 1945.8,
2199.48, 3740.76, 6499.2, 32188.68, 42337.44, 0, 0)), class = "data.frame", row.names = c(NA,
-20L))
我需要通过支持列(KR 和 aeroclub)为每个组提供变量 turnover fee comiss intencive
通过下一个条件计算值。
例如,我们采用 KR 和营业额变量。最后 2 个值属于日期 03.06.2021-04.06.2021
。如果最近的值大于前一个值,则计算值之和189901+0=189901。然后为日期 05.06.2021-08.06.2021(4 天)的每个变量生成随机值。这从中以随机顺序计算出 Sum 189901+(2%-10%)。
更清楚例如输出(营业额变量)
05.06.2021 189901+2%=193699,02
06.06.2021 189901+10%=208891,1
07.06.2021 189901+6%=208891,1
08.06.2021 189901+7%=203194
但有时可能是最后一个值为负数。例如。组=航空俱乐部。变量 comiss,最后 2 个值 03.06.2021-04.06.2021
在 04.06.2021 的值 -1332,但在 03.06.2021 的值 -632,因此在 04.06.2021 的值小于 03.06.2021 的值。我们对这些值求和 -1332+-632=-1954 但后来我们没有加求和,我们以随机顺序对 -1954-(2%-10%) 进行子追踪。
所以对于这个组由 comiss 期望的输出
05.06.2021 -1954-2%=-1914,92
06.06.2021 -1954-7%=-1817,22
07.06.2021 -1954-6%=-1836,76
08.06.2021 -1954-8%=-1797,68
我怎样才能正确?
以下答案假设了一些问题中不完全清楚的事情:
- 计算从第 3 列开始到最后一列
- 有0时,保持为0,不添加随机%。尽管您可以根据需要更改它。
- 很少有两个连续值具有不同符号的情况。对于这些情况,在问题之后应用最新值的规则。
#storing the unique category of supps
col_supps <- unique(mydata$supps)
#storing the columns for which the calculations will be done
col_names <- colnames(mydata)[3:ncol(mydata)]
#the data frame which will contain the output
output_df <- data.frame()
#Iterating over different supps values
for (x in col_supps) {
#storing one type of supps in a temporary data frame
mydata %>%
filter(supps %in% x)-> temp
temp1<- temp
#temp will act as a reference frame, in temp1 values will be updated
#Now, iterating over columns which we need
for (y in col_names) {
i<- 1
#with while loop, we will iterate over each elememnt of the column and save the result in temp1
while (i<=(nrow(temp)-1)) {
if(temp[i+1,y]>0 & temp[i+1,y]>=temp[i,y]){
temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]<0 & temp[i+1,y]<=temp[i,y]){
temp1[i+1,y] <- (temp[i,y]+temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]>0 & temp[i+1,y]<temp[i,y]){
temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]<0 & temp[i+1,y]>temp[i,y]){
temp1[i+1,y] <- (temp[i+1,y]) * (runif(1,1.02,1.1))
}else if(temp[i+1,y]==0){
temp[i+1,y] <- 0
}
i <- i+1
}
}
#saving the output in the output data frame before repeating the process for another type of supps
output_df %>%
bind_rows(temp1) -> output_df
}
现在 output_df
将得到您想要的最终输出。如果你想要随机值的再现性,你可以在 while
循环下 set.seed()
。如果不需要,那么您可以按原样进行。