Pandas 使用假设重复条目的索引示例

Question

我想生成一个包含重复条目的 pandas.Index，就像这样。

>>> pd.Index(np.random.choice(range(5), 10))
Int64Index([3, 0, 4, 1, 1, 3, 4, 3, 2, 0], dtype='int64')

所以我写了如下攻略：

from hypothesis.extra.pandas import indexes
from hypothesis.strategies import sampled_from

st_idx = indexes(
    elements=sampled_from(range(5)),
    min_size=10,
    max_size=10
)

然而，当我尝试从这样的策略中提取时，出现以下错误：

>>> st_idx.example()
[...]
Unsatisfiable: Unable to satisfy assumptions of condition.

During handling of the above exception, another exception occurred:
[...]
NoExamples: Could not find any valid examples in 100 tries

在一些实验中，我意识到它只有在 min_size 小于等于选择数（在本例中为 <= 5）时才有效。然而，这意味着我永远不会得到重复的例子！

我做错了什么？

编辑：显然只有 indexes 策略默认将 unique 设置为 True，将其设置为 False正如下面的答案中提到的，也适用于我的方法。

Answer 1

如果生成的索引不必有任何特定的分布，那么获得所需内容的一种方法是使用 integers 策略并使用 indexes 策略的 unique 参数来如果需要，生成副本：

import hypothesis.strategies as st

st_idx = indexes(
    st.integers(min_value=0, max_value=5), 
    min_size=10, max_size=10, 
    unique=False
)

st_idx.example()

制作中：

Int64Index([4, 1, 3, 4, 2, 5, 0, 5, 0, 0], dtype='int64')

Pandas 使用假设重复条目的索引示例

Pandas Index example with repeated entries using hypothesis

python

pandas

python-hypothesis