Ray 只使用两个线程?
Ray only using two threads?
这是我写的
@ray.remote
def check(word, words):
valid_ciphertexts = []
for key in range(26):
ciphertext = shift(word, key)
if ciphertext in words:
valid_ciphertexts.append(ciphertext)
else:
valid_ciphertexts.append(None)
return valid_ciphertexts
if __name__ == '__main__':
words = set()
with open(sys.argv[1], 'r') as lexicon:
for word in lexicon:
words.add(word.strip())
ray.init()
results = ray.get([check.remote(word, words) for word in words])
with open(sys.argv[2], 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow([key for key in range(26)])
writer.writerows(results)
由于并行化了对 check
的数十万次调用,我原本希望看到所有内核的高使用率,但仪表板显示如下:
为什么会这样?
回答于 https://discuss.ray.io/t/ray-only-using-two-threads/2085
Can you try
ray.init()
words_ref = ray.put(words)
results = ray.get([check.remote(word, words_ref) for word in words])
I’m wondering if this is because words it’s taking too long to keep re-serializing words
对我有用:
I’m confused as to why there’s a difference here to be honest, I thought Python was pass-by-reference anyway?
... The issue is that Ray doesn’t have a way of knowing if words has changed between calls, so it keeps re-putting it in the object store.
这是我写的
@ray.remote
def check(word, words):
valid_ciphertexts = []
for key in range(26):
ciphertext = shift(word, key)
if ciphertext in words:
valid_ciphertexts.append(ciphertext)
else:
valid_ciphertexts.append(None)
return valid_ciphertexts
if __name__ == '__main__':
words = set()
with open(sys.argv[1], 'r') as lexicon:
for word in lexicon:
words.add(word.strip())
ray.init()
results = ray.get([check.remote(word, words) for word in words])
with open(sys.argv[2], 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow([key for key in range(26)])
writer.writerows(results)
由于并行化了对 check
的数十万次调用,我原本希望看到所有内核的高使用率,但仪表板显示如下:
为什么会这样?
回答于 https://discuss.ray.io/t/ray-only-using-two-threads/2085
Can you try
ray.init() words_ref = ray.put(words) results = ray.get([check.remote(word, words_ref) for word in words])
I’m wondering if this is because words it’s taking too long to keep re-serializing words
对我有用:
I’m confused as to why there’s a difference here to be honest, I thought Python was pass-by-reference anyway?
... The issue is that Ray doesn’t have a way of knowing if words has changed between calls, so it keeps re-putting it in the object store.