Spark 是否在其集群中维护哈希函数?

Does Spark maintain Hash Functions across its cluster?

hashCodegeneral contract 表示

This integer need not remain consistent from one execution of an application to another execution of the same application.

那么对于像 Spark 这样每个执行程序都有单独的 JVM 的东西,它是否会做任何事情来确保哈希码在整个集群中保持一致?

根据我的经验,我使用具有确定性哈希的东西,所以这不是问题。

In my experience I use things with deterministic hashes so it hasn't been a problem.

这确实是要走的路,Spark 不能 克服使用具有不确定哈希码的对象。

Java 枚举的使用是一个特别臭名昭著的例子,它可能会出错,请参阅:http://dev.bizo.com/2014/02/beware-enums-in-spark.html。引用 post:

... the hashCode method on Java's enum type is based on the memory address of the object. So while yes, we're guaranteed that the same enum value have a stable hashCode inside a particular JVM (since the enum will be a static object) - we don't have this guarantee when you try to compare hashCodes of Java enums with identical values living in different JVMs