以编程方式计算 R 数据框中单列中的多项选择条目

Tallying multiple choice entries in a single column in a R dataframe programmatically

调查数据通常包含多项选择列,条目之间以逗号分隔,例如:

library("tidyverse")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

最好有一个函数 multiple_choice_tally 来计算问题的独特回答:

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

构建multiple_choice_tally最有效、最灵活的方法是什么,无需任何硬编码。

我目前针对这个问题的解决方案如下:

multiple_choice_tally <- function(survey.data, question) {
  ## Require a sym for the RHS of !!response := if_else
  question_as_quo <- enquo(question)
  question_as_string <- quo_name(question_as_quo)
  target_question <- rlang::sym(question_as_string)

  ## Collate unique responses to the question
  unique_responses <- survey.data %>%
    select(!!target_question) %>%
    na.omit() %>%
    .[[1]] %>%
    strsplit(",") %>%
    unlist() %>%
    trimws() %>%
    unique()

  ## Extract responses to question
  question_tally <- survey.data %>%
    select(!!target_question) %>%
    na.omit()

  ## Iteratively create a column for each unique response
  invisible(lapply(unique_responses,
                   function(response) {
                     question_tally <<- question_tally %>%
                       mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))

                   }))

  ## Gather into tidy form
  question_tally %>%
    summarise_if(is.logical, funs(sum)) %>%
    gather(response, value = count)

}

然后可以按如下方式使用:

library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

我们可以使用tidyr包中的separate_rows来扩展question.2中的内容。由于您使用的是 tidyverse,因此 tidyr 已经加载了 library("tidyverse"),我们不必再次加载它。 my_survey2 是最终输出。

my_survey2 <- my_survey %>%
  separate_rows(question.2) %>%
  count(question.2) %>%
  rename(response = question.2, count = n)

my_survey2
# A tibble: 3 × 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

更新:设计一个函数

我们可以将上面的代码转换成函数如下。

multiple_choice_tally <- function(survey.data, question){
  question <- enquo(question)
  survey.data2 <- survey.data %>%
    separate_rows(!!question) %>%
    count(!!question) %>%
    setNames(., c("response", "count"))
  return(survey.data2)
}

my_survey %>%
  multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2