哈希图中的简单哈希码误解？

Question

我正在实现我自己的专用 hashmap，它具有通用值类型，但键总是 long 类型。在这里和那里，我看到有人建议我应该将密钥乘以质数，然后对桶数取模：

int bucket = (key * prime) % numOfBuckets;

我不明白为什么？在我看来，它与 simple:

具有完全相同的分布

int bucket = key % numOfBuckets;

例如，如果 numOfBuckets 为 8，第二个 "algorithm" 我们会得到类似 {0, 1, 2, 3, 4, 5, 6, 7} 的桶，重复 key = 0 到无穷大。在针对相同密钥的第一个算法中，我们得到桶 {0, 3, 6, 1, 4, 7, 2, 5} （或类似的）也重复。基本上我们遇到了与使用身份哈希时相同的问题。

基本上，在这两种情况下我们都会遇到键冲突：

key = x + k*numOfBuckets (for k = 1 to infinity; and x = key % numOfBuckets)

因为当我们对 numOfBuckets 求模时，我们总是得到 x。那么，第一个算法是怎么回事，有人能赐教吗？

Answer 1

如果 numOfBuckets 是 2 的幂并且 prime 是奇数（这似乎是预期的用例），那么我们有 gcd(numOfBuckets, prime) == 1。这反过来意味着有一个数字 inverse 使得 inverse * numOfBuckets = 1 (mod numOfBuckets)，所以乘法是一个双射运算，它只是稍微打乱了桶。那当然是没有用的，所以你的结论是正确的。

或者更直观地说：在乘法中，信息只从最低位流向最高位，从不反向。因此，如果没有乘法，桶索引不依赖 的任何位仍然会被丢弃有乘法。

一些其他技术确实有帮助，例如 Java 的 HashMap 使用此：

/** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }

另一个可行的方法是乘以某个大常数，然后使用结果的 upper 位（其中包含它们下面的位的混合，因此所有位钥匙可以这样使用）。

哈希图中的简单哈希码误解？

Simple hashcode in hashmap misconception?

algorithm

hash-code-uniqueness

hashmap