选择相同元素的概率

Question

假设我有一个包含 30,000 个名字的列表。我想从一组 8 种可能的水果中 select 4 次。但是，我希望任何名字 select 吃任何水果两次的概率是 20%。我将如何在代码中解决这个问题。

举个例子： 30,000个名字的列表水果：

苹果
香蕉
橙色
葡萄
猕猴桃
菠萝
西瓜
火龙果

我希望 John selecting Apple Apple Banana Orange OR Apple Banana Orange Banana OR Dragonfruit Watermelon Grape Grape OR Grape Grape Grape Kiwi 的概率正好是 20%。

换句话说，我希望 80% 的名称列表没有匹配的水果 selected，20% 的名称列表有 1 对匹配的水果。

Answer 1

您可以在这里轻松使用rejection sampling。

对于每个人，决定他们是在 20% 还是 80% 中。如果您只想要 20% 的样本，请在开始时随机选择 6000 人。

如果他们在 20% 内，请从您的一组 8 个水果中重复为他们选择 4 个，直到它们包含一个重复的水果。

如果他们在 80% 之内，请从您的一组 8 个水果中重复为他们选择 4 个，直到它们不再重复。

这里有一些 python 代码生成 30 个样本（而不是用于演示目的的 30000 个样本），其中 20% 包含重复的水果：

import random

N = 30

fruits = 'Apple Banana Orange Grape Kiwi Pineapple Watermelon Dragonfruit'.split()

def sample(repeats):
    while True:
        s = [random.choice(fruits) for _ in range(4)]
        if len(set(s)) == 4 - repeats:
            return s


population = list(range(N))
twenty_percenters = set(random.sample(population, N // 5))

for p in population:
    in20 = p in twenty_percenters
    print(p, '*' * in20, sample(in20))

选择相同元素的概率

Probability of selecting the same element

java

algorithm

probability

permutation