R:用绝对(正)输出将 Y 列除以 Z,然后对 X 列的每个唯一值的输出求和
R: Divide column Y by Z with absolute (positive) output, then sum output per unique value for column X
对于每个 jobtask
我试图用两个单独的列将 value
除以 weight
以计算 +1 值和 -1 值。对于 outputnegative
,我特别需要绝对(正)值。
有了这个,我想再添加 2 列,每个值 occupation
(a/b/c) 总和 outputpositive
和 outputnegative
。我似乎无法弄清楚,任何帮助将不胜感激!!!
occupation <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
jobtask <- c("1", "2", "3", "4","5", "6", "7", "8", "9", "10", "11", "12")
value <- c("1", "1", "0", "-1", "-1", "0", "-1", "1", "-1", "1", "0", "0")
weight <- c("95", "81", "97", "65", "43", "92", "89", "43", "58", "99", "35", "69")
df <- data.frame(occupation, jobtask, value, weight)
下面的期望输出:
occupation jobtask value weight outputpos outputneg occupationpos occupationneg
1 a 1 1 95 0.95 1.76 0.65
2 a 2 1 81 0.81 1.76 0.65
3 a 3 0 97 1.76 0.65
4 a 4 -1 65 -0.65 1.76 0.65
5 b 5 -1 43 -0.43 0.43 1.22
6 b 6 0 92 0.43 1.22
7 b 7 -1 89 -0.89 0.43 1.22
8 b 8 1 43 0.43 0.43 1.22
9 c 9 -1 58 -0.58 0.99 0.58
10 c 10 1 99 0.99 0.99 0.58
11 c 11 0 35 0.99 0.58
12 c 12 0 69 0.99 0.58
修复多年的额外内容。在 df 中,每一行都是一年,因此该特定职业的 "sum" 现在是 5.42,而它应该像上面的示例一样是 0.95 + 0.81 = 1.76。每个jobtask
都有不同数量的year
,所以不能简单的除以一个数来计算年数。有什么建议吗?
occupation jobtask year value weight outputpos outputneg occupationpos occupationneg
1 a 1 2015 1 95 0.95 5.42
2 a 1 2016 1 95 0.95 5.42
3 a 1 2017 1 95 0.95 5.42
4 a 1 2018 1 95 0.95 5.42
5 a 2 2015 1 81 0.81 5.42
6 a 2 2016 1 81 0.81 5.42 ```
我们可以先使用 type.convert
将列类型转换为数字,然后通过将 'weight' 除以 100 创建一个 'output' 的列,然后使用 case_when
创建'outputpos'、'outputneg'根据取值1或-1,按'occupation'分组,得到'output'的sum
,其中'value'为1, 'value' 为 -1 以创建 'occupation_pos'、'occupation_neg'
library(dplyr)
df <- type.convert(df, as.is = TRUE)
df %>%
mutate(output = weight/100,
outputpos = case_when(value == 1 ~ output),
outputneg = case_when(value == -1 ~ -1 *output)) %>%
group_by(occupation) %>%
mutate(occupation_pos = sum(output[value == 1]),
occupation_neg = sum(output[value == -1])) %>%
select(-output)
# A tibble: 12 x 8
# Groups: occupation [3]
# occupation jobtask value weight outputpos outputneg occupation_pos occupation_neg
# <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 a 1 1 95 0.95 NA 1.76 0.65
# 2 a 2 1 81 0.81 NA 1.76 0.65
# 3 a 3 0 97 NA NA 1.76 0.65
# 4 a 4 -1 65 NA -0.65 1.76 0.65
# 5 b 5 -1 43 NA -0.43 0.43 1.32
# 6 b 6 0 92 NA NA 0.43 1.32
# 7 b 7 -1 89 NA -0.89 0.43 1.32
# 8 b 8 1 43 0.43 NA 0.43 1.32
# 9 c 9 -1 58 NA -0.580 0.99 0.580
#10 c 10 1 99 0.99 NA 0.99 0.580
#11 c 11 0 35 NA NA 0.99 0.580
#12 c 12 0 69 NA NA 0.99 0.580
对于每个 jobtask
我试图用两个单独的列将 value
除以 weight
以计算 +1 值和 -1 值。对于 outputnegative
,我特别需要绝对(正)值。
有了这个,我想再添加 2 列,每个值 occupation
(a/b/c) 总和 outputpositive
和 outputnegative
。我似乎无法弄清楚,任何帮助将不胜感激!!!
occupation <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
jobtask <- c("1", "2", "3", "4","5", "6", "7", "8", "9", "10", "11", "12")
value <- c("1", "1", "0", "-1", "-1", "0", "-1", "1", "-1", "1", "0", "0")
weight <- c("95", "81", "97", "65", "43", "92", "89", "43", "58", "99", "35", "69")
df <- data.frame(occupation, jobtask, value, weight)
下面的期望输出:
occupation jobtask value weight outputpos outputneg occupationpos occupationneg
1 a 1 1 95 0.95 1.76 0.65
2 a 2 1 81 0.81 1.76 0.65
3 a 3 0 97 1.76 0.65
4 a 4 -1 65 -0.65 1.76 0.65
5 b 5 -1 43 -0.43 0.43 1.22
6 b 6 0 92 0.43 1.22
7 b 7 -1 89 -0.89 0.43 1.22
8 b 8 1 43 0.43 0.43 1.22
9 c 9 -1 58 -0.58 0.99 0.58
10 c 10 1 99 0.99 0.99 0.58
11 c 11 0 35 0.99 0.58
12 c 12 0 69 0.99 0.58
修复多年的额外内容。在 df 中,每一行都是一年,因此该特定职业的 "sum" 现在是 5.42,而它应该像上面的示例一样是 0.95 + 0.81 = 1.76。每个jobtask
都有不同数量的year
,所以不能简单的除以一个数来计算年数。有什么建议吗?
occupation jobtask year value weight outputpos outputneg occupationpos occupationneg
1 a 1 2015 1 95 0.95 5.42
2 a 1 2016 1 95 0.95 5.42
3 a 1 2017 1 95 0.95 5.42
4 a 1 2018 1 95 0.95 5.42
5 a 2 2015 1 81 0.81 5.42
6 a 2 2016 1 81 0.81 5.42 ```
我们可以先使用 type.convert
将列类型转换为数字,然后通过将 'weight' 除以 100 创建一个 'output' 的列,然后使用 case_when
创建'outputpos'、'outputneg'根据取值1或-1,按'occupation'分组,得到'output'的sum
,其中'value'为1, 'value' 为 -1 以创建 'occupation_pos'、'occupation_neg'
library(dplyr)
df <- type.convert(df, as.is = TRUE)
df %>%
mutate(output = weight/100,
outputpos = case_when(value == 1 ~ output),
outputneg = case_when(value == -1 ~ -1 *output)) %>%
group_by(occupation) %>%
mutate(occupation_pos = sum(output[value == 1]),
occupation_neg = sum(output[value == -1])) %>%
select(-output)
# A tibble: 12 x 8
# Groups: occupation [3]
# occupation jobtask value weight outputpos outputneg occupation_pos occupation_neg
# <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 a 1 1 95 0.95 NA 1.76 0.65
# 2 a 2 1 81 0.81 NA 1.76 0.65
# 3 a 3 0 97 NA NA 1.76 0.65
# 4 a 4 -1 65 NA -0.65 1.76 0.65
# 5 b 5 -1 43 NA -0.43 0.43 1.32
# 6 b 6 0 92 NA NA 0.43 1.32
# 7 b 7 -1 89 NA -0.89 0.43 1.32
# 8 b 8 1 43 0.43 NA 0.43 1.32
# 9 c 9 -1 58 NA -0.580 0.99 0.580
#10 c 10 1 99 0.99 NA 0.99 0.580
#11 c 11 0 35 NA NA 0.99 0.580
#12 c 12 0 69 NA NA 0.99 0.580