将字典随机拆分为 n 个部分
randomly split dictionary into n parts
我有一个 quanteda
字典,我想随机分成 n
个部分。
dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))
我试过像这样使用 split
函数:split(dict, f=factor(3))
但没有成功。
我想取回三本词典,但我得到了
$`3`
Dictionary object with 2 key entries.
- [positive]:
- good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
- bad, worst, awful, atrocious, deplorable, horrendous
编辑
我在字典中加入了一个包含 *
的不同条目。 Ken Benoit 建议的解决方案在这种情况下会引发错误,但在其他情况下工作得很好。
期望的输出是这样的:
> dict_1
Dictionary object with 2 key entries.
- [positive]:
- good, wonderf*
- [negative]:
- deplorable, horrendous
> dict_2
Dictionary object with 2 key entries.
- [positive]:
- amazing, best
- [negative]:
- bad, worst
> dict_3
Dictionary object with 2 key entries.
- [positive]:
- outstanding, beautiful
- [negative]:
- awful, atrocious
如果条目数不能除以 n
没有余数,我没有指定但 理想情况下 我可以决定我想要(我) 分别 'remainder' 或 (ii) 我希望分配所有值(这导致某些拆分稍大)。
问题中有很多未指定的地方,因为对于不同长度的字典键,不清楚应该如何处理,而且你的预期答案中没有任何模式。
在这里,我假设你有等长的键,可以被分割而没有余数整除,并且你想将它分割成 运行,每个字典键的相邻间隔。
这应该可以做到。
library("quanteda")
## Package version: 1.5.1
dict <- dictionary(
list(
positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
)
)
dictionary_split <- function(x, len) {
maxlen <- max(lengths(x)) # change to minumum to avoid recycling
subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
names(splitlist) <- paste0("dict_", seq_along(splitlist))
lapply(splitlist, dictionary)
}
dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
## - good, amazing
## - [negative]:
## - bad, worst
##
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
## - best, outstanding
## - [negative]:
## - awful, atrocious
##
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
## - beautiful, delightful
## - [negative]:
## - deplorable, horrendous
我有一个 quanteda
字典,我想随机分成 n
个部分。
dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))
我试过像这样使用 split
函数:split(dict, f=factor(3))
但没有成功。
我想取回三本词典,但我得到了
$`3`
Dictionary object with 2 key entries.
- [positive]:
- good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
- bad, worst, awful, atrocious, deplorable, horrendous
编辑
我在字典中加入了一个包含 *
的不同条目。 Ken Benoit 建议的解决方案在这种情况下会引发错误,但在其他情况下工作得很好。
期望的输出是这样的:
> dict_1
Dictionary object with 2 key entries.
- [positive]:
- good, wonderf*
- [negative]:
- deplorable, horrendous
> dict_2
Dictionary object with 2 key entries.
- [positive]:
- amazing, best
- [negative]:
- bad, worst
> dict_3
Dictionary object with 2 key entries.
- [positive]:
- outstanding, beautiful
- [negative]:
- awful, atrocious
如果条目数不能除以 n
没有余数,我没有指定但 理想情况下 我可以决定我想要(我) 分别 'remainder' 或 (ii) 我希望分配所有值(这导致某些拆分稍大)。
问题中有很多未指定的地方,因为对于不同长度的字典键,不清楚应该如何处理,而且你的预期答案中没有任何模式。
在这里,我假设你有等长的键,可以被分割而没有余数整除,并且你想将它分割成 运行,每个字典键的相邻间隔。
这应该可以做到。
library("quanteda")
## Package version: 1.5.1
dict <- dictionary(
list(
positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
)
)
dictionary_split <- function(x, len) {
maxlen <- max(lengths(x)) # change to minumum to avoid recycling
subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
names(splitlist) <- paste0("dict_", seq_along(splitlist))
lapply(splitlist, dictionary)
}
dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
## - good, amazing
## - [negative]:
## - bad, worst
##
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
## - best, outstanding
## - [negative]:
## - awful, atrocious
##
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
## - beautiful, delightful
## - [negative]:
## - deplorable, horrendous