R:用绝对(正)输出将 Y 列除以 Z,然后对 X 列的每个唯一值的输出求和

R: Divide column Y by Z with absolute (positive) output, then sum output per unique value for column X

对于每个 jobtask 我试图用两个单独的列将 value 除以 weight 以计算 +1 值和 -1 值。对于 outputnegative,我特别需要绝对(正)值。

有了这个,我想再添加 2 列,每个值 occupation (a/b/c) 总和 outputpositiveoutputnegative。我似乎无法弄清楚,任何帮助将不胜感激!!!

occupation <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
jobtask <- c("1", "2", "3", "4","5", "6", "7", "8", "9", "10", "11", "12")
value <- c("1", "1", "0", "-1", "-1", "0", "-1", "1", "-1", "1", "0", "0")
weight <- c("95", "81", "97", "65", "43", "92", "89", "43", "58", "99", "35", "69")

df <- data.frame(occupation, jobtask, value, weight)

下面的期望输出:

   occupation jobtask value weight outputpos outputneg occupationpos occupationneg
1           a       1     1     95      0.95                    1.76          0.65
2           a       2     1     81      0.81                    1.76          0.65
3           a       3     0     97                              1.76          0.65
4           a       4    -1     65               -0.65          1.76          0.65
5           b       5    -1     43               -0.43          0.43          1.22
6           b       6     0     92                              0.43          1.22
7           b       7    -1     89               -0.89          0.43          1.22
8           b       8     1     43      0.43                    0.43          1.22
9           c       9    -1     58               -0.58          0.99          0.58
10          c      10     1     99      0.99                    0.99          0.58
11          c      11     0     35                              0.99          0.58
12          c      12     0     69                              0.99          0.58

修复多年的额外内容。在 df 中,每一行都是一年,因此该特定职业的 "sum" 现在是 5.42,而它应该像上面的示例一样是 0.95 + 0.81 = 1.76。每个jobtask都有不同数量的year,所以不能简单的除以一个数来计算年数。有什么建议吗?

 occupation jobtask year value weight outputpos outputneg occupationpos occupationneg
1          a       1 2015     1     95      0.95                    5.42              
2          a       1 2016     1     95      0.95                    5.42             
3          a       1 2017     1     95      0.95                    5.42             
4          a       1 2018     1     95      0.95                    5.42             
5          a       2 2015     1     81      0.81                    5.42             
6          a       2 2016     1     81      0.81                    5.42             ```

我们可以先使用 type.convert 将列类型转换为数字,然后通过将 'weight' 除以 100 创建一个 'output' 的列,然后使用 case_when 创建'outputpos'、'outputneg'根据取值1或-1,按'occupation'分组,得到'output'的sum,其中'value'为1, 'value' 为 -1 以创建 'occupation_pos'、'occupation_neg'

library(dplyr)
df <- type.convert(df, as.is = TRUE)
df %>% 
   mutate(output = weight/100, 
          outputpos = case_when(value == 1 ~ output), 
          outputneg = case_when(value == -1 ~ -1 *output)) %>% 
   group_by(occupation) %>% 
   mutate(occupation_pos = sum(output[value == 1]),
          occupation_neg = sum(output[value == -1])) %>%
   select(-output)
# A tibble: 12 x 8
# Groups:   occupation [3]
#   occupation jobtask value weight outputpos outputneg occupation_pos occupation_neg
#   <chr>        <int> <int>  <int>     <dbl>     <dbl>          <dbl>          <dbl>
# 1 a                1     1     95      0.95    NA               1.76          0.65 
# 2 a                2     1     81      0.81    NA               1.76          0.65 
# 3 a                3     0     97     NA       NA               1.76          0.65 
# 4 a                4    -1     65     NA       -0.65            1.76          0.65 
# 5 b                5    -1     43     NA       -0.43            0.43          1.32 
# 6 b                6     0     92     NA       NA               0.43          1.32 
# 7 b                7    -1     89     NA       -0.89            0.43          1.32 
# 8 b                8     1     43      0.43    NA               0.43          1.32 
# 9 c                9    -1     58     NA       -0.580           0.99          0.580
#10 c               10     1     99      0.99    NA               0.99          0.580
#11 c               11     0     35     NA       NA               0.99          0.580
#12 c               12     0     69     NA       NA               0.99          0.580