在 dplyr 中改变 2 个新列中 TRUE 或 FALSE 的比例

Mutate in dplyr the proportions of TRUE or FALSE in 2 new columns

我在 R 中有以下 table:

CAT CONDITION
A TRUE
A TRUE
A FALSE
A TRUE
B TRUE
B TRUE
B TRUE
B FALSE
cat = c(rep("A",4),rep("B",4));cat
cond = c("TRUE","TRUE","FALSE","TRUE","FALSE","TRUE","TRUE","FALSE")
data3 = cbind(cat,cond);data3

我想在 dplyr 中减少它,给我(改变两个新列)第一个新列中的 TRUE 百分比和第二个新列中 FALSE 的百分比 column.Like 这个 :

CAT TRUE FALSE
A 0.5 0.5
B 0.75 0.25

像这样?

library(tidyverse)
cat = c(rep("A",4),rep("B",4))
cond = c("TRUE","TRUE","FALSE","TRUE","FALSE","TRUE","TRUE","FALSE")
data3 = data.frame(cat,cond)

data3 %>%
  group_by(cat) %>%
  summarise("TRUE" = sum(cond == TRUE) / n(),
            "FALSE" = sum(cond == FALSE) / n())
#> # A tibble: 2 × 3
#>   cat   `TRUE` `FALSE`
#>   <chr>  <dbl>   <dbl>
#> 1 A       0.75    0.25
#> 2 B       0.5     0.5

reprex package (v2.0.1)

创建于 2022-02-24

这是另一种方法:如果你想获得每个 CATTRUE FALSE 的比例,那么这应该可行:

library(dplyr)
library(tidyr)

df %>% 
  group_by(CAT, CONDITION) %>% 
  tally() %>% 
  mutate(n = (n/sum(n))) %>% 
  pivot_wider(
    id_cols = CAT,
    names_from = CONDITION,
    values_from = n
  )
  CAT   `FALSE` `TRUE`
  <chr>   <dbl>  <dbl>
1 A        0.25   0.75
2 B        0.25   0.75

您也可以简单地将 meanlogical 变量一起使用:

library(dplyr)
data3 %>% 
        as_tibble() %>% # this converts your matrix into a tibble
        mutate(cond = as.logical(cond)) %>% # convert character to logical
        group_by(cat) %>% 
        summarise("TRUE" = mean(cond),
                  "FALSE" = mean(!cond))

输出:

# A tibble: 2 x 3
  cat   `TRUE` `FALSE`
  <chr>  <dbl>   <dbl>
1 A       0.75    0.25
2 B       0.5     0.5 

dplyr 之外,使用 prop.tabletable 变得非常容易:

with(as.data.frame(data3), prop.table(table(cat, cond), 1))

   cond
cat FALSE TRUE
  A  0.25 0.75
  B  0.50 0.50

或者,更简单(归功于@G.Grothendieck),使用xtabs:

prop.table(xtabs(~., data3), 1)