从多个向量或列表中查找字符的出现
Finding occurrence of character from multiple vector or list
我希望找到 unique/distinct 字符在多个向量或列表中出现的次数。
也许最好用一个例子来描述;
在这个例子中,假设 "unique character" 是字母。多个 "vectors" 是书。我想找出随着书本数量的增加字母出现的次数。
# Initial data in the format of a list
book_list <- list(book_A <- c("a", "b", "c", "z"),
book_B <- c("c", "d", "a"),
book_C <- c("b", "a", "c", "e", "x"))
# Initial data in the format of multiple vectors
book_A <- c("a", "b", "c", "z")
book_B <- c("c", "d", "a")
book_C <- c("b", "a", "c", "e", "x")
# Finding the unique letters in each book
# This is the part im struggling to code in a loop fashion
one_book <- length(unique(book_A))
two_book <- length(unique(c(book_A, book_B)))
three_book <- length(unique(c(book_A, book_B, book_C)))
# Plot the desired output
plot(x=c(1,2,3),
y=c(one_book, two_book, three_book),
ylab = "Number of unqiue letters", xlab = "Book Number",
main="The occurence of unique letters as number of book increases")
注意:真实的数据集要大得多。每个向量(book_A、book_B...等)的长度约为 7000。
我试图用 dplyr 或数据框解决问题,但我还没有完全解决。
# Explore data frame option with an example data
library(dplyr)
df <- read.delim("http://m.uploadedit.com/ba3s/148950223626.txt")
# Group them
df_group <- dplyr::group_by(df, book) %>% summarize(occurence = length(letter))
# Use the cummuative sum
plot(x=1:length(unique(df$book)), y=cumsum(df_group$occurence))
但我知道绘图不正确,因为它只是绘制了累积总和,而不是我想要的。任何提示都会很有帮助。
为了增加复杂性,如果可以绘制出字母数量最少的书,那就太好了。沿线的东西
# Example ;
# Find the length of the letters in the book
lapply(book_list, length)
# I know that book_B is has the shortest number of letters (3);
# followed by book_A (4) then book_C (5)
one_book <- length(unique(book_B))
two_book <- length(unique(c(book_B, book_A)))
three_book <- length(unique(c(book_B, book_A, book_C)))
plot(x=c(1,2,3),
y=c(one_book, two_book, three_book),
ylab = "Number of letters", xlab = "Book Number")
您可以将 Reduce
与 accumulate = TRUE
一起使用,即
sapply(Reduce(c, book_list, accumulate = TRUE), function(i) length(unique(i)))
#[1] 4 5 7
我希望找到 unique/distinct 字符在多个向量或列表中出现的次数。
也许最好用一个例子来描述;
在这个例子中,假设 "unique character" 是字母。多个 "vectors" 是书。我想找出随着书本数量的增加字母出现的次数。
# Initial data in the format of a list
book_list <- list(book_A <- c("a", "b", "c", "z"),
book_B <- c("c", "d", "a"),
book_C <- c("b", "a", "c", "e", "x"))
# Initial data in the format of multiple vectors
book_A <- c("a", "b", "c", "z")
book_B <- c("c", "d", "a")
book_C <- c("b", "a", "c", "e", "x")
# Finding the unique letters in each book
# This is the part im struggling to code in a loop fashion
one_book <- length(unique(book_A))
two_book <- length(unique(c(book_A, book_B)))
three_book <- length(unique(c(book_A, book_B, book_C)))
# Plot the desired output
plot(x=c(1,2,3),
y=c(one_book, two_book, three_book),
ylab = "Number of unqiue letters", xlab = "Book Number",
main="The occurence of unique letters as number of book increases")
注意:真实的数据集要大得多。每个向量(book_A、book_B...等)的长度约为 7000。
我试图用 dplyr 或数据框解决问题,但我还没有完全解决。
# Explore data frame option with an example data
library(dplyr)
df <- read.delim("http://m.uploadedit.com/ba3s/148950223626.txt")
# Group them
df_group <- dplyr::group_by(df, book) %>% summarize(occurence = length(letter))
# Use the cummuative sum
plot(x=1:length(unique(df$book)), y=cumsum(df_group$occurence))
但我知道绘图不正确,因为它只是绘制了累积总和,而不是我想要的。任何提示都会很有帮助。
为了增加复杂性,如果可以绘制出字母数量最少的书,那就太好了。沿线的东西
# Example ;
# Find the length of the letters in the book
lapply(book_list, length)
# I know that book_B is has the shortest number of letters (3);
# followed by book_A (4) then book_C (5)
one_book <- length(unique(book_B))
two_book <- length(unique(c(book_B, book_A)))
three_book <- length(unique(c(book_B, book_A, book_C)))
plot(x=c(1,2,3),
y=c(one_book, two_book, three_book),
ylab = "Number of letters", xlab = "Book Number")
您可以将 Reduce
与 accumulate = TRUE
一起使用,即
sapply(Reduce(c, book_list, accumulate = TRUE), function(i) length(unique(i)))
#[1] 4 5 7