用R计算行中出现频率最高的单词
Count most frequent word in row by R
下面有一个table
Name Mon Tue Wed Thu Fri Sat Sun
1 John Apple Orange Apple Banana Apple Apple Orange
2 Ricky Banana Apple Banana Banana Banana Banana Apple
3 Alex Apple Orange Orange Apple Apple Orange Orange
4 Robbin Apple Apple Apple Apple Apple Banana Banana
5 Sunny Banana Banana Apple Apple Apple Banana Banana
所以,我想计算每个人出现频率最高的水果并将这些值添加到新列中。
例如
Name Mon Tue Wed Thu Fri Sat Sun Max_Acc Count
1 John Apple Orange Apple Banana Apple Apple Orange Apple 4
2 Ricky Banana Apple Banana Banana Banana Banana Apple Banana 5
3 Alex Apple Orange Orange Apple Apple Orange Orange Orange 4
4 Robbin Apple Apple Apple Apple Apple Banana Banana Apple 5
5 Sunny Banana Banana Apple Apple Apple Banana Banana Banana 4
我在查找行时遇到问题。我可以使用 table()
函数在列中找到频率。
>table(df$Mon)
Apple Banana
3 2
但在这里我想要新列中最常见水果的名称。
如果我们需要 "Count" 和 "Names" 对应于 max
"Count",我们循环遍历数据集的行(使用 apply
和MARGIN = 1
),用table
得到频率,从中提取最大值和最大值对应的names
,rbind
它和cbind
用原始数据集。
cbind(df1, do.call(rbind, apply(df1[-1], 1, function(x) {
x1 <- table(x)
data.frame(Count = max(x1), Names=names(x1)[which.max(x1)])})))
# Name Mon Tue Wed Thu Fri Sat Sun Count Names
#1 John Apple Orange Apple Banana Apple Apple Orange 4 Apple
#2 Ricky Banana Apple Banana Banana Banana Banana Apple 5 Banana
#3 Alex Apple Orange Orange Apple Apple Orange Orange 4 Orange
#4 Robbin Apple Apple Apple Apple Apple Banana Banana 5 Apple
#5 Sunny Banana Banana Apple Apple Apple Banana Banana 4 Banana
或者我们可以使用data.table
library(data.table)
setDT(df1)[, c("Names", "Count") := {tbl <- table(unlist(.SD))
.(names(tbl)[which.max(tbl)], max(tbl))}, by = Name]
另一种方法是循环遍历所有独特的水果,如下所示
fruits_unique <- unique(unlist(dat[-1]))
occurence <- sapply(fruits_unique, function(x) rowSums(dat[,-1] == x))
# Using this data to create the resulting columns
ind <- apply(occurence,1,which.max)
dat$Names <- fruits_unique[ind]
dat$count <- occurence[cbind(seq_along(ind), ind)]
结果:
Name Mon Tue Wed Thu Fri Sat Sun Names Count
1 John Apple Orange Apple Banana Apple Apple Orange Apple 4
2 Ricky Banana Apple Banana Banana Banana Banana Apple Banana 5
3 Alex Apple Orange Orange Apple Apple Orange Orange Orange 4
4 Robbin Apple Apple Apple Apple Apple Banana Banana Apple 5
5 Sunny Banana Banana Apple Apple Apple Banana Banana Banana 4
下面有一个table
Name Mon Tue Wed Thu Fri Sat Sun
1 John Apple Orange Apple Banana Apple Apple Orange
2 Ricky Banana Apple Banana Banana Banana Banana Apple
3 Alex Apple Orange Orange Apple Apple Orange Orange
4 Robbin Apple Apple Apple Apple Apple Banana Banana
5 Sunny Banana Banana Apple Apple Apple Banana Banana
所以,我想计算每个人出现频率最高的水果并将这些值添加到新列中。
例如
Name Mon Tue Wed Thu Fri Sat Sun Max_Acc Count
1 John Apple Orange Apple Banana Apple Apple Orange Apple 4
2 Ricky Banana Apple Banana Banana Banana Banana Apple Banana 5
3 Alex Apple Orange Orange Apple Apple Orange Orange Orange 4
4 Robbin Apple Apple Apple Apple Apple Banana Banana Apple 5
5 Sunny Banana Banana Apple Apple Apple Banana Banana Banana 4
我在查找行时遇到问题。我可以使用 table()
函数在列中找到频率。
>table(df$Mon)
Apple Banana
3 2
但在这里我想要新列中最常见水果的名称。
如果我们需要 "Count" 和 "Names" 对应于 max
"Count",我们循环遍历数据集的行(使用 apply
和MARGIN = 1
),用table
得到频率,从中提取最大值和最大值对应的names
,rbind
它和cbind
用原始数据集。
cbind(df1, do.call(rbind, apply(df1[-1], 1, function(x) {
x1 <- table(x)
data.frame(Count = max(x1), Names=names(x1)[which.max(x1)])})))
# Name Mon Tue Wed Thu Fri Sat Sun Count Names
#1 John Apple Orange Apple Banana Apple Apple Orange 4 Apple
#2 Ricky Banana Apple Banana Banana Banana Banana Apple 5 Banana
#3 Alex Apple Orange Orange Apple Apple Orange Orange 4 Orange
#4 Robbin Apple Apple Apple Apple Apple Banana Banana 5 Apple
#5 Sunny Banana Banana Apple Apple Apple Banana Banana 4 Banana
或者我们可以使用data.table
library(data.table)
setDT(df1)[, c("Names", "Count") := {tbl <- table(unlist(.SD))
.(names(tbl)[which.max(tbl)], max(tbl))}, by = Name]
另一种方法是循环遍历所有独特的水果,如下所示
fruits_unique <- unique(unlist(dat[-1]))
occurence <- sapply(fruits_unique, function(x) rowSums(dat[,-1] == x))
# Using this data to create the resulting columns
ind <- apply(occurence,1,which.max)
dat$Names <- fruits_unique[ind]
dat$count <- occurence[cbind(seq_along(ind), ind)]
结果:
Name Mon Tue Wed Thu Fri Sat Sun Names Count
1 John Apple Orange Apple Banana Apple Apple Orange Apple 4
2 Ricky Banana Apple Banana Banana Banana Banana Apple Banana 5
3 Alex Apple Orange Orange Apple Apple Orange Orange Orange 4
4 Robbin Apple Apple Apple Apple Apple Banana Banana Apple 5
5 Sunny Banana Banana Apple Apple Apple Banana Banana Banana 4