造句总结一个简单的数据框

Make a sentence to summarize a simple data frame

我有一个简单的 df,如下所示,我想为它创建一个摘要符号。建造它的最有效方法是什么?有人可以指导我吗?

我想构建的摘要是: There are 2 students in ELA: G8-01, G9-08; There are 2 students in MATH: G8-09, G9-06; There is 1 student in ART: G9-04.

structure(list(ID = c("G8-01", "G8-09", "G9-08", "G9-04", "G9-05", 
"G9-06", "G9-07"), ELA = c("G8-01", NA, "G9-08", NA, NA, NA, 
NA), MATH = c(NA, "G8-09", NA, NA, NA, "G9-06", NA), PE = c(NA, 
NA, NA, NA, NA, NA, NA), ART = c(NA, NA, NA, "G9-04", NA, NA, 
NA)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))

您通常会使用 cat 执行此操作。您可能希望将列及其名称映射在一起,为了整洁起见,将其放在一个小函数中:

report <- function(data) {
  Map(function(x, nm) {
    cat('There are ', sum(!is.na(x)), " students in ", nm, ": ",
        paste(x[!is.na(x)], collapse = ', '), '\n', sep = '')
  }, x = data[-1], nm = names(data)[-1])
  invisible(NULL)
}

这导致:

report(df)
#> There are 2 students in ELA: G8-01, G9-08
#> There are 2 students in MATH: G8-09, G9-06
#> There are 0 students in PE:
#> There are 1 students in ART: G9-04

如果你想要每个主题的报告,之前的答案已经很好了。如果你只是想像你说的那样自动获取该行,你可以使用:

首先,创建一个包含所有科目、学生人数和代码的摘要:

example = example %>% 
  pivot_longer(cols=c(-ID),names_to='Subject',values_to='Code') %>% 
  filter(! is.na(Code)) %>% 
  group_by(Subject) %>% 
  summarise(n_students = n(),
            Codes = paste0(Code, collapse=', '))

把所有东西放在一起:

lapply(example, 
           function(i) paste0(paste("There are",example$n_students,"students in",example$Subject,":",example$Codes),
                              collapse='; '))[[1]]

输出:

[1] "There are 1 students in ART : G9-04; There are 2 students in ELA : G8-01, G9-08; There are 2 students in MATH : G8-09, G9-06"

也许lapply不是最优雅的方式,但是,它确实有效。此外,您可以将 as.factor 应用于主题并创建级别以根据需要对句子进行排序。

使用stringr::str_glue_data()格式化字符串的tidyverse解决方案:

library(tidyverse)

df %>%
  pivot_longer(-1, values_drop_na = TRUE) %>%
  group_by(name) %>%
  summarise(n = n(), id = toString(value)) %>%
  str_glue_data("There {ifelse(n>1, 'are', 'is')} {n} student{ifelse(n>1, 's', '')} in {name}: {id};")

哪个returns

# There is 1 student in ART: G9-04;
# There are 2 students in ELA: G8-01, G9-08;
# There are 2 students in MATH: G8-09, G9-06;

您可以使用包 cli 中的 pluralize()

library(cli)
library(dplyr)
library(purrr)

df %>% 
  select(-ID) %>% 
  map(discard, is.na) %>% 
  compact() %>% 
  iwalk(~ cat(pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"), sep = "\n"))

给出以下内容:

There are 2 students in ELA: G8-01 and G9-08
There are 2 students in MATH: G8-09 and G9-06
There is 1 student in ART: G9-04

如果您想在其他地方使用它,您可以将其调整为 return 文本。我在此示例中使用 cat() 将其打印到控制台。

例如保存文字:

txt <- df %>% 
  select(-ID) %>% 
  map(discard, is.na) %>% 
  compact() %>% 
  imap_chr(~ pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"))

unname(txt)
# [1] "There are 2 students in ELA: G8-01 and G9-08" 
# [2] "There are 2 students in MATH: G8-09 and G9-06"
# [3] "There is 1 student in ART: G9-04"