Ray 只使用两个线程?

Ray only using two threads?

这是我写的

@ray.remote
def check(word, words):
    valid_ciphertexts = []
    for key in range(26):
        ciphertext = shift(word, key)
        if ciphertext in words:
            valid_ciphertexts.append(ciphertext)
        else:
            valid_ciphertexts.append(None)
    return valid_ciphertexts


if __name__ == '__main__':
    words = set()
    with open(sys.argv[1], 'r') as lexicon:
        for word in lexicon:
            words.add(word.strip())
    ray.init()
    results = ray.get([check.remote(word, words) for word in words])
    with open(sys.argv[2], 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([key for key in range(26)])
        writer.writerows(results)

由于并行化了对 check 的数十万次调用,我原本希望看到所有内核的高使用率,但仪表板显示如下:

为什么会这样?

回答于 https://discuss.ray.io/t/ray-only-using-two-threads/2085

Can you try

ray.init()
words_ref = ray.put(words)
results = ray.get([check.remote(word, words_ref) for word in words])

I’m wondering if this is because words it’s taking too long to keep re-serializing words

对我有用:

I’m confused as to why there’s a difference here to be honest, I thought Python was pass-by-reference anyway?

... The issue is that Ray doesn’t have a way of knowing if words has changed between calls, so it keeps re-putting it in the object store.