对 R 的一列中的单词列表进行排序
Order a list of words in one column of R
我有来自 apriori 的输出数据框,规则如下:
rules
{A,B} => {C}
{C,A} => {B}
{A,B} => {D}
{A,D} => {B}
{A,B} => {E}
{E,A} => {B}
我明白了,我在每个规则中对项目进行了分组(data.frame 是 df_basket)
rules basket
{A,B} => {C} A,B,C
{C,A} => {B} C,A,B
{A,B} => {D} A,B,D
{A,D} => {B} A,D,B
{A,B} => {E} A,B,E
{E,A} => {B} E,A,B
我希望能够按照下面给出的字母顺序订购篮子:
rules basket Group
{A,B} => {C} A,B,C A,B,C
{C,A} => {B} C,A,B A,B,C
{A,B} => {D} A,B,D A,B,D
{A,D} => {B} A,D,B A,B,D
{A,B} => {E} A,B,E A,B,E
{E,A} => {B} E,A,B A,B,E
我使用了下面的代码,它适用于小型数据框并完成了工作。对于大数据帧,for 循环效率低下。请帮助我优化 R 中的这个原子操作:
for(i in 1:nrow(df_basket))
{
df_basket$Basket[i]<- ifelse(1==1,paste(unlist(strsplit(df_basket$basket[i],","))
[order(unlist(strsplit(df_basket$basket[i],",")))],collapse=","))
}
如果有任何更简单或更直接的方法来获取我的数据框的 "Group" 字段,请告诉我。
尝试调整此解决方案:
f<-function(x)
{
sorted<-sort(unlist(strsplit(x,",")))
return(paste0(sorted,collapse = ","))
}
cbind(basket,unlist(lapply(basket,f)))
输入数据:
basket<-c("A,B,C","C,A,B","A,B,D","A,D,B","A,B,E","E,A,B")
输出:
basket
[1,] "A,B,C" "A,B,C"
[2,] "C,A,B" "A,B,C"
[3,] "A,B,D" "A,B,D"
[4,] "A,D,B" "A,B,D"
[5,] "A,B,E" "A,B,E"
[6,] "E,A,B" "A,B,E"
这是另一种使用来自 arules
的更多支持的方法:
### create some random data and mine rules
library("arules")
dat <- replicate(10, sample(LETTERS[1:5], size = 3), simplify = FALSE)
trans <- as(dat, "transactions")
rules <- apriori(trans)
inspect(rules)
lhs rhs support confidence lift count
[1] {} => {A} 0.8 0.8 1.000000 8
[2] {B} => {A} 0.6 1.0 1.250000 6
[3] {C,D} => {E} 0.2 1.0 1.428571 2
[4] {B,D} => {A} 0.1 1.0 1.250000 1
[5] {B,C} => {A} 0.2 1.0 1.250000 2
[6] {B,E} => {A} 0.3 1.0 1.250000 3
### Get the itemsets that generated each rule and convert the itemsets
### into a list. I use a list, since in gerneral, rules will not all
### have the same number of items.
itemsets <- as(items(generatingItemsets(rules)), "list")
### sort the item labels alphabetically. Note that you could already
### start with the item labels correctly sorted in the transaction set
### (see manual page for itemcoding in arules).
lapply(itemsets, sort)
[[1]]
[1] "A"
[[2]]
[1] "A" "B"
[[3]]
[1] "C" "D" "E"
[[4]]
[1] "A" "B" "D"
[[5]]
[1] "A" "B" "C"
[[6]]
[1] "A" "B" "E"
如果所有规则都具有相同数量的项目,那么您可以将此列表放入矩阵中。
如果你想把它们作为一个字符串,那么你可以这样做:
sapply(lapply(itemsets, sort), paste0, collapse = ",")
[1] "A" "A,B" "C,D,E" "A,B,D" "A,B,C" "A,B,E"
我有来自 apriori 的输出数据框,规则如下:
rules
{A,B} => {C}
{C,A} => {B}
{A,B} => {D}
{A,D} => {B}
{A,B} => {E}
{E,A} => {B}
我明白了,我在每个规则中对项目进行了分组(data.frame 是 df_basket)
rules basket
{A,B} => {C} A,B,C
{C,A} => {B} C,A,B
{A,B} => {D} A,B,D
{A,D} => {B} A,D,B
{A,B} => {E} A,B,E
{E,A} => {B} E,A,B
我希望能够按照下面给出的字母顺序订购篮子:
rules basket Group
{A,B} => {C} A,B,C A,B,C
{C,A} => {B} C,A,B A,B,C
{A,B} => {D} A,B,D A,B,D
{A,D} => {B} A,D,B A,B,D
{A,B} => {E} A,B,E A,B,E
{E,A} => {B} E,A,B A,B,E
我使用了下面的代码,它适用于小型数据框并完成了工作。对于大数据帧,for 循环效率低下。请帮助我优化 R 中的这个原子操作:
for(i in 1:nrow(df_basket))
{
df_basket$Basket[i]<- ifelse(1==1,paste(unlist(strsplit(df_basket$basket[i],","))
[order(unlist(strsplit(df_basket$basket[i],",")))],collapse=","))
}
如果有任何更简单或更直接的方法来获取我的数据框的 "Group" 字段,请告诉我。
尝试调整此解决方案:
f<-function(x)
{
sorted<-sort(unlist(strsplit(x,",")))
return(paste0(sorted,collapse = ","))
}
cbind(basket,unlist(lapply(basket,f)))
输入数据:
basket<-c("A,B,C","C,A,B","A,B,D","A,D,B","A,B,E","E,A,B")
输出:
basket
[1,] "A,B,C" "A,B,C"
[2,] "C,A,B" "A,B,C"
[3,] "A,B,D" "A,B,D"
[4,] "A,D,B" "A,B,D"
[5,] "A,B,E" "A,B,E"
[6,] "E,A,B" "A,B,E"
这是另一种使用来自 arules
的更多支持的方法:
### create some random data and mine rules
library("arules")
dat <- replicate(10, sample(LETTERS[1:5], size = 3), simplify = FALSE)
trans <- as(dat, "transactions")
rules <- apriori(trans)
inspect(rules)
lhs rhs support confidence lift count
[1] {} => {A} 0.8 0.8 1.000000 8
[2] {B} => {A} 0.6 1.0 1.250000 6
[3] {C,D} => {E} 0.2 1.0 1.428571 2
[4] {B,D} => {A} 0.1 1.0 1.250000 1
[5] {B,C} => {A} 0.2 1.0 1.250000 2
[6] {B,E} => {A} 0.3 1.0 1.250000 3
### Get the itemsets that generated each rule and convert the itemsets
### into a list. I use a list, since in gerneral, rules will not all
### have the same number of items.
itemsets <- as(items(generatingItemsets(rules)), "list")
### sort the item labels alphabetically. Note that you could already
### start with the item labels correctly sorted in the transaction set
### (see manual page for itemcoding in arules).
lapply(itemsets, sort)
[[1]]
[1] "A"
[[2]]
[1] "A" "B"
[[3]]
[1] "C" "D" "E"
[[4]]
[1] "A" "B" "D"
[[5]]
[1] "A" "B" "C"
[[6]]
[1] "A" "B" "E"
如果所有规则都具有相同数量的项目,那么您可以将此列表放入矩阵中。
如果你想把它们作为一个字符串,那么你可以这样做:
sapply(lapply(itemsets, sort), paste0, collapse = ",")
[1] "A" "A,B" "C,D,E" "A,B,D" "A,B,C" "A,B,E"