如何引用满足特定条件的另一个值,然后可以在计算中使用该值

How to reference another value that meets certain conditions which can then be used in a calculation

我有两个数据框:

测试

Group.1   x
   1     25.5
   2     51
   3     51.5
   4     50
   5     51.5
   6     60 
   ...
   53    35.5

日历

Week   Hours  HourSpent
  1     8.5
  1     8.5
  1      0
  2     8.5
  2     8.5
  2     8.5
  2     8.5
  2     8.5
  2     6.5
  2     8.5
  3     7.0
  3     7.0
  3     8.2
  ...

我想做的是通过执行以下计算来填充日历 df 中的 'HourSpent' 列:(('Hours' / 'HourSpent') * 0.79)

我希望能够遍历日历 df 中的每一行并将行 'Hours' 值除以匹配的 'HourSpent' 值。 'HoursSpent' 值可以从 'Test' df 中确定...因此,如果日历 df 中 'Week' 列中的值与 'Group.1' 列中的任何值相匹配 'Test' df 然后我希望 'Test' df 的 'x' 列中的相应值是 'HourSpent' 值。

例如

日历 df 中的第 1 行将为 8.5 / 25.5 * 0.79...这将应用于前 3 行,因为周数为 1。然后当我们到达第 4 行时,计算将更改为 8.5/ 51 * 0.79 等等...等等

所需输出 - 日历 df

Week   Hours  HourSpent
  1     8.5     0.2633
  1     8.5     0.2633
  1      0        0
  2     8.5     0.1317
  2     8.5     0.1317
  2     8.5     0.1317
  2     8.5     0.1317
  2     8.5     0.1317
  2     6.5     0.1007
  2     8.5     0.1317
  3     7.0     0.1074
  ...

已尝试代码

for (i in 1:nrow(Calendar)){

 Calendar$'HourSpent' <- ifelse(Calendar$Week == Test$Group.1, 
 (Calendar$Hours/Test$x)*0.79, 
 0)

}

问题是这似乎只适用于一行然后其他一切都是 0...这个问题有更好的解决方案吗?

非常感谢

Test <- data.frame(`Group.1` = c(1, 2, 3, 4), x = c(25.5, 51, 51.5, 50))
Calendar <- data.frame(Week = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Hours = c(8.5, 8.5, 0, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5))
Calendar <- dplyr::inner_join(Calendar, Test, by = c("Week" = "Group.1")) %>% 
            dplyr::mutate(Hours_spent = (Hours/x)*0.79)

输出

Calendar

  Week Hours    x Hours_spent
1    1   8.5 25.5   0.2633333
2    1   8.5 25.5   0.2633333
3    1   0.0 25.5   0.0000000
4    2   8.5 51.0   0.1316667
5    2   8.5 51.0   0.1316667
6    2   8.5 51.0   0.1316667
7    3   8.5 51.5   0.1303883
8    3   8.5 51.5   0.1303883
9    3   8.5 51.5   0.1303883

基础R解法:

Test <- data.frame(Group.1 = 1:4, x = runif(4)*100, stringsAsFactors = FALSE)
Calendar <- data.frame(Week = sort(sample(1:4, 10, replace = TRUE)), Hours = runif(10)*100, HourSpent = NA, stringsAsFactors = FALSE)

head(Test)
# Group.1         x
# 1       1  7.163006
# 2       2 55.743758
# 3       3 48.983705
# 4       4 49.429236

head(Calendar)
# Week    Hours HourSpent
# 1    1 41.22831         NA
# 2    1 68.30103         NA
# 3    1 65.34278         NA
# 4    2 91.59863         NA
# 5    2 81.31131         NA
# 6    2 67.58900         NA

names(Test)[which(names(Test) == "Group.1")] <- "Week"

Calendar <- merge(Calendar, Test, by = "Week", all.x  = TRUE)

Calendar$HourSpent <- ((Calendar$Hours/Calendar$x) * 0.79)

head(Calendar)
# Week    Hours HourSpent         x
# 1    1 41.22831  4.5470251  7.163006
# 2    1 68.30103  7.5328452  7.163006
# 3    1 65.34278  7.2065835  7.163006
# 4    2 91.59863  1.2981349 55.743758
# 5    2 81.31131  1.1523431 55.743758
# 6    2 67.58900  0.9578707 55.743758

我想

What I am trying to do is to populate the 'HourSpent' column in the Calendar df by doing the following calculation: (('Hours' / 'HourSpent') * 0.79)

有错字吗?因为那需要解决 Hours - HourSpent^2 = 0.

形式的问题

编辑:

此外,使用 for loop 也没什么问题(尤其是如果您是初学者;但这在大型数据集上可能会很慢)。如果我们适当地充实其逻辑,那么这就是您的 for loop 的样子:

for(i in 1:nrow(Calendar)){
  
  for(j in 1:nrow(Test)){
    
    if(Calendar$Week[i] == Test$Group.1[j] & is.na(Calendar$HourSpent[i])){
      
      Calendar$HourSpent[i] <- ((Calendar$Hours[i]/Test$x[j]) * 0.79)
      
    }
    
  }
  
}

(基本思路:如果Week值和Group.1值为equal/identical,且对应的HourSpent列还没有填写,然后计算HourSpent.)