(int, const char *) 作为 uthash 库的复合键

Question

我想将 uthash 库用于散列 table，其中一对 int 和 const char * 作为复合键：

typedef struct entry_s {
    // This field is needed by the uthash library
    UT_hash_handle hh;

    // Values
    /* ... */

    // Compound key
    int num;
    const char *str;
} entry;

具体来说，我希望 const char * 指向的字符串成为密钥的一部分。澄清一下：指针的不同值可能对应同一个字符串（在strcmp()的意义上）。

userguide 显示了如何使用 int 和 char[] 作为复合键来实现与我想要的键类似的键：

typedef struct another_entry_s {
    // This field is needed by the uthash library
    UT_hash_handle hh;

    // Values
    /* ... */

    int str_len;

    // Compound key
    int num;
    char str[];
} another_entry;

但是，第二种方法（即 (int, char[])）假设字符串被复制到 char[]，但我想避免复制。

此外，我不是在寻找连接 int 和 const char * 指向的字符串以利用 HASH_ADD_KEYPTR() 和 HASH_FIND_STR() 便利宏。

我不知道如何使用第一种方法（即 (int, const char *)）使用 HASH_ADD()、HASH_FIND() 和其他通用宏。看起来避免复制是不可能的，就像 uthash 库设计一样。我理解对吗？或者有没有我忽略的非复制方法？

Answer 1

这个库的设计是不可能的（任何通用 实现都不可能不复制）。

对于任何哈希表实现，您需要对某些数据应用一些哈希函数。因此，您当然可以编写 specific 实现，其中散列函数使用整数字段的字节 and 字符串的字节一些其他字段点到。但是，如果您的散列表实现是 generic，您对散列函数的唯一选择将与此类似：

unsigned int hash(void *data, size_t size);

原型不必看起来完全像这样，但无论如何，输入是指向一些数据的指针（任何类型的）和该数据的大小。所以，很明显，您不能同时从两个不同的位置读取这样的函数。

根据the uthash documentation，uthash通过要求它们由相邻的结构成员组成来解决复合键的问题。然后从这些成员中的第一个读取数据，其大小包括所有成员 和填充 。库的文档知道这个问题，并要求必须将结构初始化为全零，例如使用 memset()，因此填充位具有定义的值。如果你想使用它，你必须使你的字符串成为结构的成员（而不是指向它的指针）。

虽然这在大多数实现中都可以正常工作，但我个人根本不会依赖该功能，因为 C 标准不保证定义后的填充值设置一些成员，见

C11（N1570 草案），§6.2.6.1 p6:

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. [...]

因此，在这个库中使用复合密钥的真正安全且便携的方法是：获取数据的串联副本。你可以做这样的事情，给你上面的结构添加一个字段 char *hashKey:

#define ENTRY_KEYLEN(str) (sizeof(int) + strlen(str))
#define ENTRY_GETKEY(key, e) (getEntryKey((key), (e)->num, (e)->str))

static void getEntryKey(char *key, int num, const char *str)
{
    memcpy(key, &num, sizeof num);
    memcpy(key + sizeof num, str);
}

然后你可以像这样使用 uthash 宏：

entry *entries = 0;

entry *myent;
// allocate space, fill data in myent

// store in hashtable:
char *key = malloc(ENTRY_KEYLEN(myent->str));
// check key for NULL
ENTRY_GETKEY(key, myent);
myent->hashKey = key;
HASH_ADD_KEYPTR(hh, entries, key, ENTRY_KEYLEN(myent->str), myent);

// [...]

// find in hashtable
const char *str = "foo";
int id = 42;
key = malloc(ENTRY_KEYLEN(str));
// check key for NULL
getEntryKey(key, id, str);
entry *found;
HASH_FIND(hh, entries, key, ENTRY_KEYLEN(str), found);
free(key);

您可能使用不同的通用哈希表实现会更好，这会使您的用例更容易一些，例如通过使用某些 回调函数 来检索哈希键数据。

(int, const char *) 作为 uthash 库的复合键

(int, const char *) as a compound key with the uthash library

c

string

pointers

hashtable

uthash