将字典随机拆分为 n 个部分

Question

我有一个 quanteda 字典，我想随机分成 n 个部分。

dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
            negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))

我试过像这样使用 split 函数：split(dict, f=factor(3)) 但没有成功。

我想取回三本词典，但我得到了

$`3`
Dictionary object with 2 key entries.
- [positive]:
  - good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
  - bad, worst, awful, atrocious, deplorable, horrendous

编辑

我在字典中加入了一个包含 * 的不同条目。 Ken Benoit 建议的解决方案在这种情况下会引发错误，但在其他情况下工作得很好。

期望的输出是这样的：

> dict_1
Dictionary object with 2 key entries.
- [positive]:
  - good, wonderf*
- [negative]:
  - deplorable, horrendous

> dict_2
Dictionary object with 2 key entries.
- [positive]:
  - amazing, best
- [negative]:
  - bad, worst

> dict_3
Dictionary object with 2 key entries.
- [positive]:
  - outstanding, beautiful
- [negative]:
  - awful, atrocious

如果条目数不能除以 n 没有余数，我没有指定但 理想情况下 我可以决定我想要（我) 分别 'remainder' 或 (ii) 我希望分配所有值（这导致某些拆分稍大）。

Answer 1

问题中有很多未指定的地方，因为对于不同长度的字典键，不清楚应该如何处理，而且你的预期答案中没有任何模式。

在这里，我假设你有等长的键，可以被分割而没有余数整除，并且你想将它分割成运行，每个字典键的相邻间隔。

这应该可以做到。

library("quanteda")
## Package version: 1.5.1

dict <- dictionary(
  list(
    positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
    negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
  )
)

dictionary_split <- function(x, len) {
  maxlen <- max(lengths(x)) # change to minumum to avoid recycling
  subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
  splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
  names(splitlist) <- paste0("dict_", seq_along(splitlist))
  lapply(splitlist, dictionary)
}

dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
##   - good, amazing
## - [negative]:
##   - bad, worst
## 
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
##   - best, outstanding
## - [negative]:
##   - awful, atrocious
## 
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
##   - beautiful, delightful
## - [negative]:
##   - deplorable, horrendous

将字典随机拆分为 n 个部分

randomly split dictionary into n parts

split

r

quanteda