"Zero frequent items" 使用eclat挖掘频繁项集时

Question

所以我想根据一起购买的物品和根据 eclat 的 wiki 找到模式和 "clusters"：

The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

不过，当我在 R 中使用 eclat 时，通过 tidLists 检索结果时，我得到 "zero frequent items" 和 "NULL"。谁能看出我做错了什么？

完整数据集：https://pastebin.com/8GbjnHK2

每一行都是一笔交易，列中包含不同的项目。数据快速快照：

3060615;;;;;;;;;;;;;;;
3060612;3060616;;;;;;;;;;;;;;
3020703;;;;;;;;;;;;;;;
3002469;;;;;;;;;;;;;;;
3062800;;;;;;;;;;;;;;;
3061943;3061965;;;;;;;;;;;;;;

代码

trans = read.transactions("Transactions.csv", format = "basket", sep = ";")

f <- eclat(trans, parameter = list(supp = 0.1, maxlen = 17, tidLists = TRUE))

dim(tidLists(f))

as(tidLists(f), "list")

会不会是数据结构的问题？在那种情况下，我应该如何改变它？此外，我该怎么做才能获得建议的项目集？我无法从维基中弄清楚。

编辑：按照@hpesoj626 的建议，我将 0.004 用于支持。但似乎该功能正在对 orders/users 而不是项目进行分组。我不知道如何导出数据，所以这里是 tidLists 的图片：

Answer 1

问题是您将支持设置得太高了。尝试调整 supp 说，supp = .001，我们得到

dim(tidLists(f))

# [1]   928 15840

对于您的数据集，最高支持度是 0.08239，低于 0.1。这就是为什么你没有得到 supp = 0.1.

的结果

inspect(head(sort(f, by = "support"), 10))

#      items             support count
# [1]  {3060620}         0.08239 1305 
# [2]  {3060619}         0.07260 1150 
# [3]  {3061124}         0.05688  901 
# [4]  {3060618}         0.05663  897 
# [5]  {4027039}         0.04975  788 
# [6]  {3060617}         0.04564  723 
# [7]  {3061697}         0.04306  682 
# [8]  {3060619,3060620} 0.03087  489 
# [9]  {3039715}         0.02727  432 
# [10] {3045117}         0.02708  429

"Zero frequent items" 使用eclat挖掘频繁项集时

"Zero frequent items" when using the eclat to mine frequent itemsets

r

data-mining