Numpy 种子有时不适用于 dask 函数
Numpy seed sometimes not working for dask function
当 运行 跨 dask 分布式函数调用时,播种有时会失败。希望将种子值传递给大多数情况下有效的一组 MC 模拟试验;但不总是。问题归结为以下示例:
from dask.distributed import Client
import numpy as np
def get_rand4seed(seedx):
np.random.seed(seedx)
rand1 = np.random.rand(1)[0]
return seedx, rand1
seedrange = 100
seed_ids = np.arange(0,seedrange).tolist()
client = Client()
a = client.map(get_rand4seed, seed_ids)
results = client.gather(a)
client.close()
for result in results:
# take seed packed in result and calculate correct 1st random number
np.random.seed(result[0])
correct_result = np.random.rand(1)[0]
# comparing with 1st random number calculated in parallelized func
comparison = 'seed=%s, dask=%s, correct=%s' % (result[0], result[1], correct_result)
if result[1] != correct_result:
print('DIFF: %s' % comparison)
else:
pass
#print(comparison)
通常,5% 到 10% 的案例是不正确的,并且在第 10 个左右的项目之后错误的可能性似乎更大。而且,有时所有 100 个项目都是正确的。示例结果:
DIFF: seed=10, dask=0.6503742417395917, correct=0.771320643266746
DIFF: seed=18, dask=0.5054533737348429, correct=0.6503742417395917
DIFF: seed=26, dask=0.038561680881409655, correct=0.30793495262497084
DIFF: seed=34, dask=0.780100460524675, correct=0.038561680881409655
DIFF: seed=69, dask=0.6063543377764754, correct=0.29624916167243354
DIFF: seed=77, dask=0.29624916167243354, correct=0.9191090317991818
DIFF: seed=85, dask=0.6575115686178157, correct=0.620373814553256
DIFF: seed=93, dask=0.3072410093435699, correct=0.6063543377764754
Python 3.6.9,dask 2.9.0
我无法运行你的代码...抱怨没有使用if __name__ == '__main__':
然后它给了我这个
NameError: name 'results' is not defined
distributed.nanny - WARNING - Restarting worker
因此,我查看了 dask.Bag 并像这样重写了您的代码
import dask.bag as db
import numpy as np
def get_rand4seed(seedx):
np.random.seed(seedx)
rand1 = np.random.rand(1)[0]
return seedx, rand1
seedrange = 100
b = db.from_sequence(np.arange(seedrange), npartitions=4)
results = b.map(get_rand4seed).compute()
for result in results:
np.random.seed(result[0])
correct_result = np.random.rand(1)[0]
comparison = 'seed=%s, dask=%s, correct=%s' % (
result[0], result[1], correct_result)
if result[1] != correct_result:
print('DIFF: %s' % comparison)
else:
pass
代码执行完美,不打印任何东西,我想这意味着一切正常。
当 运行 跨 dask 分布式函数调用时,播种有时会失败。希望将种子值传递给大多数情况下有效的一组 MC 模拟试验;但不总是。问题归结为以下示例:
from dask.distributed import Client
import numpy as np
def get_rand4seed(seedx):
np.random.seed(seedx)
rand1 = np.random.rand(1)[0]
return seedx, rand1
seedrange = 100
seed_ids = np.arange(0,seedrange).tolist()
client = Client()
a = client.map(get_rand4seed, seed_ids)
results = client.gather(a)
client.close()
for result in results:
# take seed packed in result and calculate correct 1st random number
np.random.seed(result[0])
correct_result = np.random.rand(1)[0]
# comparing with 1st random number calculated in parallelized func
comparison = 'seed=%s, dask=%s, correct=%s' % (result[0], result[1], correct_result)
if result[1] != correct_result:
print('DIFF: %s' % comparison)
else:
pass
#print(comparison)
通常,5% 到 10% 的案例是不正确的,并且在第 10 个左右的项目之后错误的可能性似乎更大。而且,有时所有 100 个项目都是正确的。示例结果:
DIFF: seed=10, dask=0.6503742417395917, correct=0.771320643266746
DIFF: seed=18, dask=0.5054533737348429, correct=0.6503742417395917
DIFF: seed=26, dask=0.038561680881409655, correct=0.30793495262497084
DIFF: seed=34, dask=0.780100460524675, correct=0.038561680881409655
DIFF: seed=69, dask=0.6063543377764754, correct=0.29624916167243354
DIFF: seed=77, dask=0.29624916167243354, correct=0.9191090317991818
DIFF: seed=85, dask=0.6575115686178157, correct=0.620373814553256
DIFF: seed=93, dask=0.3072410093435699, correct=0.6063543377764754
Python 3.6.9,dask 2.9.0
我无法运行你的代码...抱怨没有使用if __name__ == '__main__':
然后它给了我这个
NameError: name 'results' is not defined
distributed.nanny - WARNING - Restarting worker
因此,我查看了 dask.Bag 并像这样重写了您的代码
import dask.bag as db
import numpy as np
def get_rand4seed(seedx):
np.random.seed(seedx)
rand1 = np.random.rand(1)[0]
return seedx, rand1
seedrange = 100
b = db.from_sequence(np.arange(seedrange), npartitions=4)
results = b.map(get_rand4seed).compute()
for result in results:
np.random.seed(result[0])
correct_result = np.random.rand(1)[0]
comparison = 'seed=%s, dask=%s, correct=%s' % (
result[0], result[1], correct_result)
if result[1] != correct_result:
print('DIFF: %s' % comparison)
else:
pass
代码执行完美,不打印任何东西,我想这意味着一切正常。