将字典随机拆分为 n 个部分

randomly split dictionary into n parts

我有一个 quanteda 字典,我想随机分成 n 个部分。

dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
            negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))

我试过像这样使用 split 函数:split(dict, f=factor(3)) 但没有成功。

我想取回三本词典,但我得到了

$`3`
Dictionary object with 2 key entries.
- [positive]:
  - good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
  - bad, worst, awful, atrocious, deplorable, horrendous

编辑

我在字典中加入了一个包含 * 的不同条目。 Ken Benoit 建议的解决方案在这种情况下会引发错误,但在其他情况下工作得很好。

期望的输出是这样的:

> dict_1
Dictionary object with 2 key entries.
- [positive]:
  - good, wonderf*
- [negative]:
  - deplorable, horrendous

> dict_2
Dictionary object with 2 key entries.
- [positive]:
  - amazing, best
- [negative]:
  - bad, worst

> dict_3
Dictionary object with 2 key entries.
- [positive]:
  - outstanding, beautiful
- [negative]:
  - awful, atrocious

如果条目数不能除以 n 没有余数,我没有指定但 理想情况下 我可以决定我想要(我) 分别 'remainder' 或 (ii) 我希望分配所有值(这导致某些拆分稍大)。

问题中有很多未指定的地方,因为对于不同长度的字典键,不清楚应该如何处理,而且你的预期答案中没有任何模式。

在这里,我假设你有等长的键,可以被分割而没有余数整除,并且你想将它分割成 运行,每个字典键的相邻间隔。

这应该可以做到。

library("quanteda")
## Package version: 1.5.1

dict <- dictionary(
  list(
    positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
    negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
  )
)

dictionary_split <- function(x, len) {
  maxlen <- max(lengths(x)) # change to minumum to avoid recycling
  subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
  splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
  names(splitlist) <- paste0("dict_", seq_along(splitlist))
  lapply(splitlist, dictionary)
}

dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
##   - good, amazing
## - [negative]:
##   - bad, worst
## 
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
##   - best, outstanding
## - [negative]:
##   - awful, atrocious
## 
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
##   - beautiful, delightful
## - [negative]:
##   - deplorable, horrendous