在 R 中创建一个变量，指示数字 "subgroup" 行是否按组总和为 "total" 行

Question

我希望能够创建一个逻辑变量，用于指示特定类别的一组子组行（即 'group' 变量中的 A、B、C）的计数总和与我的 'All' / 整体组行的值相同。

我的数据如下：

group = c("All", "A", "B", "C", "All", "A", "B", "C")
category = c("music", "music", "music", "music", "movies", "movies", "movies", "movies")
count = c(120, 15, 75, 30, 250, 36, 28, 72)

data <- data.frame(cbind(group, category, count))

我想要的是添加“sum_to_all”列，如：

sum_to_all = c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE)

data <- data.frame(cbind(group, category, count, sum_to_all))

在这种情况下，“计数”变量子组“A”、“B”和“C”总计为音乐类别的“全部”组 (TRUE) 中的计数，但不是电影（假）类别。

我知道我可以将数据集重塑为宽，其中每个组都有自己的“计数”列并比较这些列，但我想知道是否有简单的按行解决方案。提前致谢。

Answer 1

我们可以按 'category' 分组，并通过比较 'count' 的 sum 不包括第一个观察值与 first 观察值来创建 'sum_to_all' =19=]

library(dplyr)
data %>%
    group_by(category) %>%
    mutate(sum_to_all = sum(count[-1]) == first(count)) %>%
    ungroup

-输出

# A tibble: 8 x 4
#  group category count sum_to_all
#  <chr> <chr>    <dbl> <lgl>     
#1 All   music      120 TRUE      
#2 A     music       15 TRUE      
#3 B     music       75 TRUE      
#4 C     music       30 TRUE      
#5 All   movies     250 FALSE     
#6 A     movies      36 FALSE     
#7 B     movies      28 FALSE     
#8 C     movies      72 FALSE

注意：这里我们假设 'All' 'group' 作为 first 元素。如果情况并非总是如此，请使用 arrange 或 ==

的子集

data %>%
    group_by(category) %>%
    mutate(sum_to_all = sum(count[group != 'All']) ==count[group == 'All']) %>%
    ungroup

数据

data <- data.frame(group, category, count)

在 R 中创建一个变量，指示数字 "subgroup" 行是否按组总和为 "total" 行

Create a variable in R that indicates whether numeric "subgroup" rows sum to "total" rows by group

r

rowwise

data-wrangling

数据