造句总结一个简单的数据框
Make a sentence to summarize a simple data frame
我有一个简单的 df,如下所示,我想为它创建一个摘要符号。建造它的最有效方法是什么?有人可以指导我吗?
我想构建的摘要是:
There are 2 students in ELA: G8-01, G9-08; There are 2 students in MATH: G8-09, G9-06; There is 1 student in ART: G9-04.
structure(list(ID = c("G8-01", "G8-09", "G9-08", "G9-04", "G9-05",
"G9-06", "G9-07"), ELA = c("G8-01", NA, "G9-08", NA, NA, NA,
NA), MATH = c(NA, "G8-09", NA, NA, NA, "G9-06", NA), PE = c(NA,
NA, NA, NA, NA, NA, NA), ART = c(NA, NA, NA, "G9-04", NA, NA,
NA)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
您通常会使用 cat
执行此操作。您可能希望将列及其名称映射在一起,为了整洁起见,将其放在一个小函数中:
report <- function(data) {
Map(function(x, nm) {
cat('There are ', sum(!is.na(x)), " students in ", nm, ": ",
paste(x[!is.na(x)], collapse = ', '), '\n', sep = '')
}, x = data[-1], nm = names(data)[-1])
invisible(NULL)
}
这导致:
report(df)
#> There are 2 students in ELA: G8-01, G9-08
#> There are 2 students in MATH: G8-09, G9-06
#> There are 0 students in PE:
#> There are 1 students in ART: G9-04
如果你想要每个主题的报告,之前的答案已经很好了。如果你只是想像你说的那样自动获取该行,你可以使用:
首先,创建一个包含所有科目、学生人数和代码的摘要:
example = example %>%
pivot_longer(cols=c(-ID),names_to='Subject',values_to='Code') %>%
filter(! is.na(Code)) %>%
group_by(Subject) %>%
summarise(n_students = n(),
Codes = paste0(Code, collapse=', '))
把所有东西放在一起:
lapply(example,
function(i) paste0(paste("There are",example$n_students,"students in",example$Subject,":",example$Codes),
collapse='; '))[[1]]
输出:
[1] "There are 1 students in ART : G9-04; There are 2 students in ELA : G8-01, G9-08; There are 2 students in MATH : G8-09, G9-06"
也许lapply
不是最优雅的方式,但是,它确实有效。此外,您可以将 as.factor
应用于主题并创建级别以根据需要对句子进行排序。
使用stringr::str_glue_data()
格式化字符串的tidyverse
解决方案:
library(tidyverse)
df %>%
pivot_longer(-1, values_drop_na = TRUE) %>%
group_by(name) %>%
summarise(n = n(), id = toString(value)) %>%
str_glue_data("There {ifelse(n>1, 'are', 'is')} {n} student{ifelse(n>1, 's', '')} in {name}: {id};")
哪个returns
# There is 1 student in ART: G9-04;
# There are 2 students in ELA: G8-01, G9-08;
# There are 2 students in MATH: G8-09, G9-06;
您可以使用包 cli
中的 pluralize()
。
library(cli)
library(dplyr)
library(purrr)
df %>%
select(-ID) %>%
map(discard, is.na) %>%
compact() %>%
iwalk(~ cat(pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"), sep = "\n"))
给出以下内容:
There are 2 students in ELA: G8-01 and G9-08
There are 2 students in MATH: G8-09 and G9-06
There is 1 student in ART: G9-04
如果您想在其他地方使用它,您可以将其调整为 return 文本。我在此示例中使用 cat()
将其打印到控制台。
例如保存文字:
txt <- df %>%
select(-ID) %>%
map(discard, is.na) %>%
compact() %>%
imap_chr(~ pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"))
unname(txt)
# [1] "There are 2 students in ELA: G8-01 and G9-08"
# [2] "There are 2 students in MATH: G8-09 and G9-06"
# [3] "There is 1 student in ART: G9-04"
我有一个简单的 df,如下所示,我想为它创建一个摘要符号。建造它的最有效方法是什么?有人可以指导我吗?
我想构建的摘要是:
There are 2 students in ELA: G8-01, G9-08; There are 2 students in MATH: G8-09, G9-06; There is 1 student in ART: G9-04.
structure(list(ID = c("G8-01", "G8-09", "G9-08", "G9-04", "G9-05",
"G9-06", "G9-07"), ELA = c("G8-01", NA, "G9-08", NA, NA, NA,
NA), MATH = c(NA, "G8-09", NA, NA, NA, "G9-06", NA), PE = c(NA,
NA, NA, NA, NA, NA, NA), ART = c(NA, NA, NA, "G9-04", NA, NA,
NA)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
您通常会使用 cat
执行此操作。您可能希望将列及其名称映射在一起,为了整洁起见,将其放在一个小函数中:
report <- function(data) {
Map(function(x, nm) {
cat('There are ', sum(!is.na(x)), " students in ", nm, ": ",
paste(x[!is.na(x)], collapse = ', '), '\n', sep = '')
}, x = data[-1], nm = names(data)[-1])
invisible(NULL)
}
这导致:
report(df)
#> There are 2 students in ELA: G8-01, G9-08
#> There are 2 students in MATH: G8-09, G9-06
#> There are 0 students in PE:
#> There are 1 students in ART: G9-04
如果你想要每个主题的报告,之前的答案已经很好了。如果你只是想像你说的那样自动获取该行,你可以使用:
首先,创建一个包含所有科目、学生人数和代码的摘要:
example = example %>%
pivot_longer(cols=c(-ID),names_to='Subject',values_to='Code') %>%
filter(! is.na(Code)) %>%
group_by(Subject) %>%
summarise(n_students = n(),
Codes = paste0(Code, collapse=', '))
把所有东西放在一起:
lapply(example,
function(i) paste0(paste("There are",example$n_students,"students in",example$Subject,":",example$Codes),
collapse='; '))[[1]]
输出:
[1] "There are 1 students in ART : G9-04; There are 2 students in ELA : G8-01, G9-08; There are 2 students in MATH : G8-09, G9-06"
也许lapply
不是最优雅的方式,但是,它确实有效。此外,您可以将 as.factor
应用于主题并创建级别以根据需要对句子进行排序。
使用stringr::str_glue_data()
格式化字符串的tidyverse
解决方案:
library(tidyverse)
df %>%
pivot_longer(-1, values_drop_na = TRUE) %>%
group_by(name) %>%
summarise(n = n(), id = toString(value)) %>%
str_glue_data("There {ifelse(n>1, 'are', 'is')} {n} student{ifelse(n>1, 's', '')} in {name}: {id};")
哪个returns
# There is 1 student in ART: G9-04;
# There are 2 students in ELA: G8-01, G9-08;
# There are 2 students in MATH: G8-09, G9-06;
您可以使用包 cli
中的 pluralize()
。
library(cli)
library(dplyr)
library(purrr)
df %>%
select(-ID) %>%
map(discard, is.na) %>%
compact() %>%
iwalk(~ cat(pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"), sep = "\n"))
给出以下内容:
There are 2 students in ELA: G8-01 and G9-08
There are 2 students in MATH: G8-09 and G9-06
There is 1 student in ART: G9-04
如果您想在其他地方使用它,您可以将其调整为 return 文本。我在此示例中使用 cat()
将其打印到控制台。
例如保存文字:
txt <- df %>%
select(-ID) %>%
map(discard, is.na) %>%
compact() %>%
imap_chr(~ pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"))
unname(txt)
# [1] "There are 2 students in ELA: G8-01 and G9-08"
# [2] "There are 2 students in MATH: G8-09 and G9-06"
# [3] "There is 1 student in ART: G9-04"