创建一个 table 提取字符串中的第一个字母并在 R 中计数

Question

我正在尝试提取以逗号分隔的字符串的第一个字母，然后计算该字母出现的次数。因此，我的数据框中的一列示例如下所示：

test <- data.frame("Code" =  c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK", 
"RRRF"))

我想在它旁边添加一列，如下所示：

test2 <- data.frame("Code" =  c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK", 
"RRRF"), "Code_Count" = c("E1, S1", "E1", "S1, R2", "R1"))

代码计数列提取字符串的第一个字母并计算该字母在该特定单元格中出现的次数。

我研究过使用 strsplit 获取列中用逗号分隔的第一个字母，但我不确定如何附加该字母在单元格中出现的次数。

Answer 1

这是一个使用基数 R 的选项。这会在逗号上拆分 Code 列（以及至少一个 space），然后列出第一个字母出现的次数，然后粘贴他们回到你想要的输出。它确实按字母顺序对新列进行排序（与您的输出不匹配）。希望这对您有所帮助！

test2$Coode_Count2 <- sapply(strsplit(test2$Code, ",\s+"), function(x) {
  tab <- table(substr(x, 1, 1)) # Create a table of the first letters
  paste0(names(tab), tab, collapse = ", ") # Paste together the letter w/ the number and collapse them
} )

test2
              Code Code_Count Coode_Count2
1       EKST, STFO     E1, S1       E1, S1
2             EFGG         E1           E1
3 SSGG, RRRR, RRFK     S1, R2       R2, S1
4             RRRF         R1           R1

这是一个更简洁的 stringr/purrr 解决方案，它获取单词的第一个字母并执行相同的操作（而不是拆分字符串）

library(purrr)
library(stringr)

map_chr(str_extract_all(test2$Code, "\b[A-Z]{1}"), function(x) {
  tab <- table(x)
  paste0(names(tab), tab, collapse = ", ")
  } )

数据:

test2 <- data.frame("Code" =  c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK", 
                            "RRRF"), "Code_Count" = c("E1, S1", "E1", "S1, R2", "R1"))
test2[] <- lapply(test2, as.character) # factor to character

创建一个 table 提取字符串中的第一个字母并在 R 中计数

Creating a table extracting the first letter in a string and counts in R

string

r

counting

extraction