C - 访问哈希表中的结构成员时出现段错误（插入函数）

Question

我是 C 的新手，在为我的哈希表实现插入函数时遇到问题。

这是我的结构：

typedef struct HashTableNode {
    char *url;                               // url previously seen
    struct HashTableNode *next;              // pointer to next node
} HashTableNode;

typedef struct HashTable {
    HashTableNode *table[MAX_HASH_SLOT];     // actual hashtable
} HashTable;

下面是我如何初始化 table:

HashTable *initTable(){
  HashTable* d = (HashTable*)malloc(sizeof(HashTable));
  int i;
  for (i = 0; i < MAX_HASH_SLOT; i++) {
    d->table[i] = NULL;
  }
  return d;
}

这是我的插入函数：

int HashTableInsert(HashTable *table, char *url){

    long int hashindex = JenkinsHash(url, MAX_HASH_SLOT);
    int uniqueBool = 2; // 0 for true, 1 for false, 2 for init

    HashTableNode* theNode = (HashTableNode*)malloc(sizeof(HashTableNode));
    theNode->url = url;

    if (table->table[hashindex] != NULL) { // if we have a collision

        HashTableNode* currentNode = (HashTableNode*)malloc(sizeof(HashTableNode)); 
        currentNode = table->table[hashindex]->next; // the next node in the list


        if (currentNode == NULL) { // only one node currently in list


           if (strcmp(table->table[hashindex]->url, theNode->url) != 0) { // unique node

                table->table[hashindex]->next = theNode; 
                return 0; 
           }
           else{
            printf("Repeated Node\n");
            return 1;
           }


        }
        else { // multiple nodes in this slot

            printf("There was more than one element in this slot to start with. \n");
            while (currentNode != NULL)
            {


            // SEGFAULT when accessing currentNode->url HERE
               if (strcmp(currentNode->url, table->table[hashindex]->url) == 0 ){ // same URL 

                    uniqueBool = 1; 
               }
               else{
                    uniqueBool = 0; 
               }

                currentNode = currentNode->next;

            }   
        }

        if (uniqueBool == 0) {

            printf("Unique URL\n");

            theNode->next = table->table[hashindex]->next; // splice current node in

            table->table[hashindex]->next = theNode; // needs to be a node for each slot
            return 0; 
            }

    }
    else{

        printf("simple placement into an empty slot\n");
        table->table[hashindex] = theNode; 

    }   

    return 0; 

}

每次尝试访问 currentNode->url（给定槽的链表中的下一个节点）时，我都会遇到 SegFault，如果节点本身不为 NULL，则其中应该有一个字符串.

我知道这段代码有点冒险，所以提前感谢任何愿意接受挑战的人。

芯片

更新：

这是调用所有ht函数的函数。通过我对散列 table.c 的 main() 中的常规字符串的测试，我得出结论，段错误是由于以下原因造成的：

void crawlPage(WebPage * page){

   char * new_url = NULL;  

   int pos= 0;

   pos = GetNextURL(page->html, pos, URL_PREFIX, &new_url);

   while (pos != -1){


        if (HashTableLookup(URLsVisited, new_url) == 1){ // url not in table

            printf("url is not in table......\n");

            hti(URLsVisited, new_url);

            WebPage * newPage = (WebPage*) calloc(1, sizeof(WebPage));
            newPage->url = new_url;


            printf("Adding to LIST...\n");
            add(&URLList, newPage); // added & to it.. no seg fault 


        }
        else{
            printf("skipping url cuz it is already in table\n");
        }


        new_url = NULL; 

        pos = GetNextURL(page->html, pos, URL_PREFIX, &new_url);

    }

    printf("freeing\n");
    free(new_url); // cleanup
    free(page); // free current page
}

Answer 1

您的哈希 table 插入逻辑违反了一些相当基本的规则。

在确定您实际上需要之前分配一个新节点。
您的 currentNode 分配中存在明显的内存泄漏
url 指针的所有权语义可疑。

除此之外，这个算法正在方式变得过于复杂而不是它真正应该的。

通过对 table 大小取模的散列值计算散列索引。
从散列索引的 table 槽开始，遍历节点指针，直到发生以下两种情况之一：
1. 您发现节点已经存在
2. 您到达了碰撞链的末端。

只有在上面的#2 中，您才真正分配了一个碰撞节点并将其链接到您现有的碰撞列表。当使用指针对指针的方法时，其中大部分是 微不足道的 ，我将在下面演示：

int HashTableInsert(HashTable *table, const char *url)
{
    // find collision list starting point
    long int hashindex = JenkinsHash(url, MAX_HASH_SLOT);
    HashTableNode **pp = table->table+hashindex;

    // walk the collision list looking for a match
    while (*pp && strcmp(url, (*pp)->url))
        pp = &(*pp)->next;

    if (!*pp)
    {
        // no matching node found. insert a new one.
        HashTableNode *pNew = malloc(sizeof *pNew);
        pNew->url = strdup(url);
        pNew->next = NULL;
        *pp = pNew;
    }
    else
    {   // url already in the table
        printf("url \"%s\" already present\n", url);
        return 1;
    }
    return 0;
}

仅此而已。

我之前提到的 url 所有权问题在上面通过使用 strdup() 的字符串复制得到了解决。虽然不是标准库函数，但它是 POSIX 兼容的，并且我在过去二十年中看到的每一个非尼安德特人的半生不熟的实现都提供了它。如果您没有 (a) 我想知道您使用的是什么，并且 (b) 使用 strlen 和 malloc 实现起来很简单。无论如何，当在删除值或 table 擦除期间释放节点时，请确保 free 节点的 url 在 free-ing 节点本身之前。

祝你好运。

C - 访问哈希表中的结构成员时出现段错误（插入函数）

C - Segfault when accessing struct member in a HashTable (insert function)

c

struct

pointers

segmentation-fault