根据百分比频率分布元素

Question

pandas, numpy 或 python 中是否有函数可以根据百分比值生成频率分布，就像我们可以在 java.[=12 中使用 EnumeratedDistribution 那样=]

输入：

values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 10

输出：

[0, 0, 0, 0, 0, 1, 1, 1, 2, 2]

总共10个元素中，50%由0组成，30%由1组成，20%由2组成

Answer 1

您可以使用 numpy 的 repeat() 函数将 values 中的值重复指定的次数（百分比 * 总数）：

import numpy as np


values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 11

repeats = np.around(np.array(percentage) * total).astype(np.int8)  # [6, 3, 2]

np.repeat(values, repeats)

输出：

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2])

我使用 np.around() 函数来舍入重复项，以防它们不是整数（例如，如果总数为 11，则 11*0.5 -> 6、11*0.3 -> 3 和 11*0.2 -> 2）。

Answer 2

不使用 numpy，仅使用列表理解：

values = [0, 1, 2]
percentage = [0.5, 0.30, 0.20]
total = 10

output = sum([[e]*int(total*p) for e,p in zip(values, percentage)], [])

根据百分比频率分布元素

Distribution of elements according to percentage frequency

python

statistics

numpy

frequency

pandas