汇总 r 的逻辑值计数
logical value count with summarise r
在数据框中,我有一列包含 Y 和 N 值。这个数据框也有一个 id 列。我想创建两列,一列包含总 Y 计数,另一列包含每个 ID 的总 N 计数。我尝试使用 dplyr summarize 函数
执行此过程
group_by(id) %>%
summarise(total_not = count(column_y_e_n == "N"),
total_yes = count(column_y_e_n == "Y")
但反对错误信息
Error in summarise_impl(.data, dots)
有什么建议吗?
我会使用 group_by 和 tally() 来解决这个问题。或者你可以跳过中间步骤,直接使用count。
library(tidyverse)
##Fake data
df <- tibble(
id = rep(1:20,each = 10),
column_y_e_n = sapply(1:200, function(i)sample(c("Y", "N"),1))
)
##group_by() + tally()
df_2 <- df %>%
group_by(id, column_y_e_n) %>%
tally() %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_2
#direct method
df_3 <- df %>%
count(id, column_y_e_n) %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_3
最后的管道展开结果列并格式化列名。
Harro 的原始答案略有不同:
library(tidyr)
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
bool = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
dfrSummary <- dfr %>%
group_by(
id, bool
) %>%
summarize(
count = n()
) %>%
spread(
key = bool,
value = count,
fill = 0
)
我用求和函数替换了计数函数,成功了。
group_by(id) %>%
summarise(total_not = sum(column_y_e_n == "N"),
total_yes = sum(column_y_e_n == "Y")
我通常想在 tidyverse 中做任何事情。但在这种情况下,基本的 R 解决方案似乎是合适的:
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
column_y_e_n = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
table(dfr)
给你:
column_y_e_n
id N Y
1 1 4
2 3 2
3 3 0
在数据框中,我有一列包含 Y 和 N 值。这个数据框也有一个 id 列。我想创建两列,一列包含总 Y 计数,另一列包含每个 ID 的总 N 计数。我尝试使用 dplyr summarize 函数
执行此过程 group_by(id) %>%
summarise(total_not = count(column_y_e_n == "N"),
total_yes = count(column_y_e_n == "Y")
但反对错误信息
Error in summarise_impl(.data, dots)
有什么建议吗?
我会使用 group_by 和 tally() 来解决这个问题。或者你可以跳过中间步骤,直接使用count。
library(tidyverse)
##Fake data
df <- tibble(
id = rep(1:20,each = 10),
column_y_e_n = sapply(1:200, function(i)sample(c("Y", "N"),1))
)
##group_by() + tally()
df_2 <- df %>%
group_by(id, column_y_e_n) %>%
tally() %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_2
#direct method
df_3 <- df %>%
count(id, column_y_e_n) %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_3
最后的管道展开结果列并格式化列名。
Harro 的原始答案略有不同:
library(tidyr)
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
bool = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
dfrSummary <- dfr %>%
group_by(
id, bool
) %>%
summarize(
count = n()
) %>%
spread(
key = bool,
value = count,
fill = 0
)
我用求和函数替换了计数函数,成功了。
group_by(id) %>%
summarise(total_not = sum(column_y_e_n == "N"),
total_yes = sum(column_y_e_n == "Y")
我通常想在 tidyverse 中做任何事情。但在这种情况下,基本的 R 解决方案似乎是合适的:
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
column_y_e_n = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
table(dfr)
给你:
column_y_e_n
id N Y
1 1 4
2 3 2
3 3 0