优化策略使用(数据生成)
Optimizing strategies usage (data generation)
我想优化单元测试的数据生成速度。 from_regex
和 dictionaries
等策略似乎需要很长时间才能生成示例。
下面我写了一个示例来尝试对示例生成进行基准测试:
from hypothesis import given
from hypothesis.strategies import (
booleans,
composite,
dictionaries,
from_regex,
integers,
lists,
one_of,
text,
)
param_names = from_regex(r"[a-z][a-zA-Z0-9]*(_[a-zA-Z0-9]+)*", fullmatch=True)
param_values = one_of(booleans(), integers(), text(), lists(text()))
@composite
def composite_params_dicts(draw, min_size=0):
"""Provides a dictionary of parameters."""
params = draw(
dictionaries(keys=param_names, values=param_values, min_size=min_size)
)
return params
params_dicts = dictionaries(keys=param_names, values=param_values)
@given(params=params_dicts)
def test_standard(params):
assert params is not None
@given(params=composite_params_dicts(min_size=1))
def test_composite(params):
assert len(params) > 0
@given(integer=integers(min_value=1))
def test_integer(integer):
assert integer > 0
test_integer()
测试用作参考,因为它使用了简单的策略。
因为我的一个项目中的一些长 运行 测试正在使用正则表达式生成参数名称并使用字典来生成这些参数,所以我使用这些策略添加了两个测试。
test_composite()
使用带有可选参数的复合策略。
test_standard()
使用类似的策略,只是它不是复合的。
测试结果如下:
> pytest hypothesis-sandbox/test_dicts.py --hypothesis-show-statistics
============================ test session starts =============================
platform linux -- Python 3.7.3, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/damien/Sandbox/hypothesis/.hypothesis/examples')
rootdir: /home/damien/Sandbox/hypothesis
plugins: hypothesis-4.28.2
collected 3 items
hypothesis-sandbox/test_dicts.py ... [100%]
=========================== Hypothesis Statistics ============================
hypothesis-sandbox/test_dicts.py::test_standard:
- 100 passing examples, 0 failing examples, 1 invalid examples
- Typical runtimes: 0-35 ms
- Fraction of time spent in data generation: ~ 98%
- Stopped because settings.max_examples=100
- Events:
* 2.97%, Retried draw from TupleStrategy((<hypothesis._strategies.CompositeStrategy object at 0x7f72108b9630>,
one_of(booleans(), integers(), text(), lists(elements=text()))))
.filter(lambda val: all(key(val) not in seen
for (key, seen) in zip(self.keys, seen_sets))) to satisfy filter
hypothesis-sandbox/test_dicts.py::test_composite:
- 100 passing examples, 0 failing examples, 1 invalid examples
- Typical runtimes: 0-47 ms
- Fraction of time spent in data generation: ~ 98%
- Stopped because settings.max_examples=100
hypothesis-sandbox/test_dicts.py::test_integer:
- 100 passing examples, 0 failing examples, 0 invalid examples
- Typical runtimes: < 1ms
- Fraction of time spent in data generation: ~ 57%
- Stopped because settings.max_examples=100
========================== 3 passed in 3.17 seconds ==========================
复合策略是否更慢?
如何优化自定义策略?
复合策略与生成相同数据的任何其他方法一样快,但人们倾向于将它们用于大而复杂的输入(比小而简单的输入慢)
策略优化提示减少到 "don't do slow things",因为没有办法更快。
- 尽量减少
.filter(...)
的使用,因为重试比不重试慢。
- 大写字母大小,尤其是嵌套的东西。
因此对于您的示例,如果您限制列表的大小可能会更快,但否则它会很慢(ish!),因为您生成了大量数据但没有对其进行太多处理。
我想优化单元测试的数据生成速度。 from_regex
和 dictionaries
等策略似乎需要很长时间才能生成示例。
下面我写了一个示例来尝试对示例生成进行基准测试:
from hypothesis import given
from hypothesis.strategies import (
booleans,
composite,
dictionaries,
from_regex,
integers,
lists,
one_of,
text,
)
param_names = from_regex(r"[a-z][a-zA-Z0-9]*(_[a-zA-Z0-9]+)*", fullmatch=True)
param_values = one_of(booleans(), integers(), text(), lists(text()))
@composite
def composite_params_dicts(draw, min_size=0):
"""Provides a dictionary of parameters."""
params = draw(
dictionaries(keys=param_names, values=param_values, min_size=min_size)
)
return params
params_dicts = dictionaries(keys=param_names, values=param_values)
@given(params=params_dicts)
def test_standard(params):
assert params is not None
@given(params=composite_params_dicts(min_size=1))
def test_composite(params):
assert len(params) > 0
@given(integer=integers(min_value=1))
def test_integer(integer):
assert integer > 0
test_integer()
测试用作参考,因为它使用了简单的策略。
因为我的一个项目中的一些长 运行 测试正在使用正则表达式生成参数名称并使用字典来生成这些参数,所以我使用这些策略添加了两个测试。
test_composite()
使用带有可选参数的复合策略。
test_standard()
使用类似的策略,只是它不是复合的。
测试结果如下:
> pytest hypothesis-sandbox/test_dicts.py --hypothesis-show-statistics
============================ test session starts =============================
platform linux -- Python 3.7.3, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/damien/Sandbox/hypothesis/.hypothesis/examples')
rootdir: /home/damien/Sandbox/hypothesis
plugins: hypothesis-4.28.2
collected 3 items
hypothesis-sandbox/test_dicts.py ... [100%]
=========================== Hypothesis Statistics ============================
hypothesis-sandbox/test_dicts.py::test_standard:
- 100 passing examples, 0 failing examples, 1 invalid examples
- Typical runtimes: 0-35 ms
- Fraction of time spent in data generation: ~ 98%
- Stopped because settings.max_examples=100
- Events:
* 2.97%, Retried draw from TupleStrategy((<hypothesis._strategies.CompositeStrategy object at 0x7f72108b9630>,
one_of(booleans(), integers(), text(), lists(elements=text()))))
.filter(lambda val: all(key(val) not in seen
for (key, seen) in zip(self.keys, seen_sets))) to satisfy filter
hypothesis-sandbox/test_dicts.py::test_composite:
- 100 passing examples, 0 failing examples, 1 invalid examples
- Typical runtimes: 0-47 ms
- Fraction of time spent in data generation: ~ 98%
- Stopped because settings.max_examples=100
hypothesis-sandbox/test_dicts.py::test_integer:
- 100 passing examples, 0 failing examples, 0 invalid examples
- Typical runtimes: < 1ms
- Fraction of time spent in data generation: ~ 57%
- Stopped because settings.max_examples=100
========================== 3 passed in 3.17 seconds ==========================
复合策略是否更慢?
如何优化自定义策略?
复合策略与生成相同数据的任何其他方法一样快,但人们倾向于将它们用于大而复杂的输入(比小而简单的输入慢)
策略优化提示减少到 "don't do slow things",因为没有办法更快。
- 尽量减少
.filter(...)
的使用,因为重试比不重试慢。 - 大写字母大小,尤其是嵌套的东西。
因此对于您的示例,如果您限制列表的大小可能会更快,但否则它会很慢(ish!),因为您生成了大量数据但没有对其进行太多处理。