C hsearch 查找之前未输入的值

Question

这是的后续问题。

因为我的一部分感觉剩下的问题与第一个问题无关，所以我决定将这两部分分成两个问题。

我已经按照建议 here.

使用 POSIX hcreate/hsearch 实现了关联数组

为了完整起见，这里是除了上一个问题的 main() 函数之外的所有代码：

#include <inttypes.h> /* intptr_t             */
#include <search.h>   /* hcreate(), hsearch() */
#include <stdio.h>    /* perror()             */
#include <stdlib.h>   /* exit()               */
#include <string.h>   /* strcpy()             */

void exit_with_error(const char* error_message){
  perror(error_message);
  exit(EXIT_FAILURE);
}
int fetch(const char* key, intptr_t* value){
  ENTRY e,*p;
  e.key=(char*)key;
  p=hsearch(e, FIND);
  if(!p) return 0;
  *value=(intptr_t)p->data;
  return 1;
}

void store(const char *key, intptr_t value){
  ENTRY e,*p;
  e.key=(char*)key;
  p = hsearch(e, ENTER);
  if(!p) exit_with_error("hash full");
  p->data = (void *)value;
}

这里是一个稍微扩展的 main() 函数：

int main(){
  char a[4]="foo";
  char b[4]="bar";
  char c[4]="baz";
  char t[4]="";
  char y='[=11=]';
  const char l[6]={'a','b','f','o','r','z'};
  intptr_t x=NULL;
  size_t i=0,j=0;

  if(!hcreate(50)) exit_with_error("no hash");

  strcpy(t,b);
  store(t,0);
  if(fetch(t,&x)) printf("stored %s-->%d\n",t,(int)x);
  else printf("%s not stored\n",t);

  for(i=0;i<3;++i){
    y=t[i];
    for(j=0;j<6;++j){
      if(l[j]==y) continue;
      t[i]=l[j];
      if(fetch(t,&x)) store(t,-1);
      else store(t,1);
      if(fetch(t,&x)) printf("stored %s-->%d\n",t,(int)x);
      else printf("%s not stored\n",t);
    }   
    t[i]=y;
  }

  strcpy(t,a); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);
  strcpy(t,b); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);
  strcpy(t,c); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);

  exit(EXIT_SUCCESS);
}

如您所见，我将字符串 bar 与 0 相关联。然后，我将所有与 bar 恰好有一个字母不同的单词关联到 1，如果它们之前没有添加，否则关联到 -1。最后，我查找 foo、bar 和 baz，得到预期值：

stored bar-->0
stored aar-->1
stored far-->1
stored oar-->1
stored rar-->1
stored zar-->1
stored bbr-->-1
stored bfr-->1
stored bor-->1
stored brr-->1
stored bzr-->1
stored baa-->1
stored bab-->1
stored baf-->1
stored bao-->1
stored baz-->1
foo not found
read bar-->0
read baz-->1

现在我稍微修改一下函数，用CCCTTCTTATCG & CCCTTCATTGCG:

替换foo, bar & baz

int main(){
  char a[13]="CCCTTCTTATCG"
            /*|||||| |  ||*/
  char b[13]="CCCTTCATTGCG";
  char t[13]="";
  char y='[=13=]';
  const char l[4]={'A','C','G','T'};
  intptr_t x=NULL;
  size_t i=0,j=0;

  if(!hcreate(150)) exit_with_error("no hash");

  strcpy(t,a);
  store(t,0); 
  if(fetch(t,&x)) printf("stored %s-->%d\n",t,(int)x);
  else printf("%s not stored\n",t);

  for(i=0;i<12;++i){
    y=t[i];
    for(j=0;j<4;++j){
      if(l[j]==y) continue;
      t[i]=l[j];
      if(fetch(t,&x)) store(t,-1);
      else store(t,1);
      if(fetch(t,&x)) printf("stored %s-->%d\n",t,(int)x);
      else printf("%s not stored\n",t);
    }   
    t[i]=y;
  }

  strcpy(t,a); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);
  strcpy(t,b); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);

  exit(EXIT_SUCCESS);
}

为清楚起见，这里是对应的 diff:

29,32c29,32
<   char a[4]="foo";
<   char b[4]="bar";
<   char c[4]="baz";
<   char t[4]="";
---
>   char a[13]="CCCTTCTTATCG";
>             /*|||||| |  ||*/
>   char b[13]="CCCTTCATTGCG";
>   char t[13]="";
34c34
<   const char l[6]={'a','b','f','o','r','z'};
---
>   const char l[4]={'A','C','G','T'};
38c38
<   if(!hcreate(50)) exit_with_error("no hash");
---
>   if(!hcreate(150)) exit_with_error("no hash");
40c40
<   strcpy(t,b);
---
>   strcpy(t,a);
45c45
<   for(i=0;i<3;++i){
---
>   for(i=0;i<12;++i){
47c47
<     for(j=0;j<6;++j){
---
>     for(j=0;j<4;++j){
60d59
<   strcpy(t,c); if(fetch(t,&x)) printf("read %s-->%d\n",t,(int)x); else printf("%s not found\n",t);

如您所见，CCCTTCATTGCG 有 3 个来自 CCCTTCTTATCG 的编辑（就像 foo 来自 bar），因此不应在散列 [=81] 中找到=].

然而，这是对应的输出：

stored CCCTTCTTATCG-->0
stored ACCTTCTTATCG-->1
stored GCCTTCTTATCG-->1
stored TCCTTCTTATCG-->1
stored CACTTCTTATCG-->1
stored CGCTTCTTATCG-->1
stored CTCTTCTTATCG-->1
stored CCATTCTTATCG-->1
stored CCGTTCTTATCG-->1
stored CCTTTCTTATCG-->1
stored CCCATCTTATCG-->1
stored CCCCTCTTATCG-->1
stored CCCGTCTTATCG-->1
stored CCCTACTTATCG-->1
stored CCCTCCTTATCG-->1
stored CCCTGCTTATCG-->1
stored CCCTTATTATCG-->1
stored CCCTTGTTATCG-->1
stored CCCTTTTTATCG-->1
stored CCCTTCATATCG-->1
stored CCCTTCCTATCG-->1
stored CCCTTCGTATCG-->1
stored CCCTTCTAATCG-->1
stored CCCTTCTCATCG-->1
stored CCCTTCTGATCG-->1
stored CCCTTCTTCTCG-->-1
stored CCCTTCTTGTCG-->-1
stored CCCTTCTTTTCG-->-1
stored CCCTTCTTAACG-->-1
stored CCCTTCTTACCG-->-1
stored CCCTTCTTAGCG-->-1
stored CCCTTCTTATAG-->-1
stored CCCTTCTTATGG-->-1
stored CCCTTCTTATTG-->-1
stored CCCTTCTTATCA-->-1
stored CCCTTCTTATCC-->-1
stored CCCTTCTTATCT-->-1
read CCCTTCTTATCG-->-1
read CCCTTCATTGCG-->1

如您所见，CCCTTCTTATCG 与 -1 相关联，即使它从未存储过。

对于在存储中分配 -1 的所有字符串也是如此，因为它们在即将被插入时已经在哈希中。

上面较小示例中的 bbr 也是如此。

为什么这些键从未被插入过hsearchreturns1？

Answer 1

手册页指出 key 应该是分配有 malloc 的字符串。以下是手册页中的一些相关引述

The hdestroy() function calls free(3) for each comparison key in the search table but not the data item associated with the key.

和

The comparison key (passed to hsearch() as item.key) must be allocated using malloc(3) if action is ENTER and hdestroy() is called.

因此调用 hsearch 和操作 ENTER 会在散列中存储一个指针，并且该指针应指向一个稍后可以释放的字符串。在您的代码中，您有一个单独的字符串缓冲区，每次向散列 table 添加一个条目时，您都会传递相同的指针（指向您的字符串缓冲区）。这会使散列变得一团糟 table.

解决问题很简单。在 store 函数中，在将条目添加到 table 之前，使用 strdup 复制 key。换句话说，替换

e.key=(char*)key;

和

e.key=strdup(key);

C hsearch 查找之前未输入的值

C hsearch finds values not entered before

c

posix

associative-array