TypeError: Population must be a sequence or set. For dicts, use list(d)

TypeError: Population must be a sequence or set. For dicts, use list(d)

我正在将两个文本文件导入为:

first_names = set(map(str.strip, open('first_names.all.txt')))
last_names = set(map(str.strip, open('last_names.all.txt')))

这些只是 1 列的文本文件,如下所示:

--------------------
a'isha
a'ishah
a-jay
aa'isha
aa'ishah
aaban

打印类型:

print(type(first_names))

print(type(last_names))

<class 'set'>
<class 'set'>

然后我尝试创建一个包含 first_name、last_name

的 5,000 个笛卡尔积的样本
random.sample(itertools.product(first_names, last_names), 5000)

但是我得到错误:

TypeError: Population must be a sequence or set.  For dicts, use list(d).

您不能将 random.sample 直接应用于 itertools.product 对象。 试试这个,处理一组:

p=set(itertools.product(first_names, last_names))
random.sample(p, 5000)

sample can't work on most iterator objects - it needs a sequence or a set. But turning that product into a list or a set can take up a lot of memory. Alternatively, as you already read the names to two sets, use choice on each set separately 5,000 times instead of using product:

names = [(random.choice(first_names), random.choice(last_names)) for _ in range(5000)]

注意: 这有可能重复对的陷阱,而 product.[=23 不会发生=]


克服这个问题的一种方法是将样本添加到一个集合中,该集合将处理重复项,并继续添加直到达到所需的数量:

names = set()
while len(names) != 5000:
    names.add(tuple(random.sample(first_names, k=1) + random.sample(last_names, k=1)))

警告: Python 3.9 random.sample() 不再适用于集合:

Deprecated since version 3.9: In the future, the population must be a sequence. Instances of set are no longer supported. The set must first be converted to a list or tuple, preferably in a deterministic order so that the sample is reproducible.