TypeError: Population must be a sequence or set. For dicts, use list(d)

Question

我正在将两个文本文件导入为：

first_names = set(map(str.strip, open('first_names.all.txt')))
last_names = set(map(str.strip, open('last_names.all.txt')))

这些只是 1 列的文本文件，如下所示：

--------------------
a'isha
a'ishah
a-jay
aa'isha
aa'ishah
aaban

打印类型：

print(type(first_names))

print(type(last_names))

<class 'set'>
<class 'set'>

然后我尝试创建一个包含 first_name、last_name

的 5,000 个笛卡尔积的样本

random.sample(itertools.product(first_names, last_names), 5000)

但是我得到错误：

TypeError: Population must be a sequence or set.  For dicts, use list(d).

Answer 1

您不能将 random.sample 直接应用于 itertools.product 对象。试试这个，处理一组：

p=set(itertools.product(first_names, last_names))
random.sample(p, 5000)

Answer 2

sample can't work on most iterator objects - it needs a sequence or a set. But turning that product into a list or a set can take up a lot of memory. Alternatively, as you already read the names to two sets, use choice on each set separately 5,000 times instead of using product:

names = [(random.choice(first_names), random.choice(last_names)) for _ in range(5000)]

注意： 这有可能重复对的陷阱，而 product.[=23 不会发生=]

克服这个问题的一种方法是将样本添加到一个集合中，该集合将处理重复项，并继续添加直到达到所需的数量：

names = set()
while len(names) != 5000:
    names.add(tuple(random.sample(first_names, k=1) + random.sample(last_names, k=1)))

警告： Python 3.9 random.sample() 不再适用于集合：

Deprecated since version 3.9: In the future, the population must be a sequence. Instances of set are no longer supported. The set must first be converted to a list or tuple, preferably in a deterministic order so that the sample is reproducible.

TypeError: Population must be a sequence or set. For dicts, use list(d)

TypeError: Population must be a sequence or set. For dicts, use list(d)

python

itertools