字符串的哈希函数不适用于某些字符串？

Question

基本上我的程序读取具有以下格式的文本文件：

3
chairs
tables
refrigerators

第一行的数字表示要读取的文件中的项目数。

这是我的哈希函数：

int hash(string& item, int n) {
    int hashVal = 0;
    int len = item.length();

    for(int i = 0; i < len; i++)
      hashVal = hashVal*37 + item[i];

    hashVal %= n;   

    if(hashVal < 0) hashVal += n;

    return hashVal;
}

当我的程序读取上面的文本文件时，它是成功的。但是当我尝试另一个时：

5
sabel
ziyarah
moustache
math
pedobear

程序会冻结。不是分段错误或其他任何问题，但它会停止。

有什么想法吗？

编辑：

int n, tableSize;
myFile >> n;

tableSize = generateTableSize(n); 

string item, hashTable[tableSize];

for(int i = 0; i < tableSize; i++)
    hashTable[i] = "--";

while(myFile >> item && n!=0) {
    int index = hash(item,tableSize);

    if(hashTable[index] == "--")
        hashTable[index] = item;

    else {
        int newIndex = rehash(item,tableSize);
        while(hashTable[newIndex] != "--") {
            newIndex = rehash(item,tableSize);
        }
        hashTable[newIndex] = item;
    }
    n--;
}

int rehash(string item, int n)  {
    return hash(item,n+1);
}

Answer 1

代码冻结，因为它以无限循环结束：

int index = hash(item,tableSize);

if(hashTable[index] == "--")
    hashTable[index] = item;
else {
    int newIndex = rehash(item,tableSize);
    while(hashTable[newIndex] != "--") {
        newIndex = rehash(item,tableSize);
    }
    hashTable[newIndex] = item;
}

你不断地重新计算索引，但没有改变输入参数，所以输出保持不变，因此再次重新计算。

在上面的代码中，newIndex 是基于与 index 相同的输入计算的，但是使用不同的计算函数计算的，因此它很可能具有与第一个不同的值时间，但是新索引也被占用。所以我们这次使用与以前相同的函数再次重新计算 newIndex，输入完全相同，再次给出完全相同的输出。你在散列 table 中查找相同的索引，它仍然是你上次这样做的相同值，所以你再次重新计算，再次使用相同的输入参数，给出相同的输出，你看再次进入哈希table，等等

你在前 3 行中没有看到这一点的原因是你没有发生碰撞（或者至少只有一次碰撞，这意味着 newIndex 从 rehash 函数第一次有用）。

解决方案不是增加 table 的大小（因为增加 table 的大小，最多只能降低碰撞的机会，这本身是好的，但不会解决完全是你的问题），但是要么改变函数的输入，这样你就会得到不同的输出，要么改变 hashtable 结构。

我一直觉得 Sedgewick 关于 algorithms in C++ 的书很有用，有一章是关于散列的。

遗憾的是，我手头没有我的 C++ 算法副本，所以我无法告诉你 Sedgewick 是如何解决它的，但我建议出于解决问题的简单教育目的，从简单地增加索引开始增加 1，直到您在散列 table.

中找到空闲位置

字符串的哈希函数不适用于某些字符串？

Hash function for strings not working on some strings?

c++

string

hash