分段错误 C gdb 给出错误的行

Segmentation fault C gdb give wrong line

这是触发此错误的代码部分(这是一个更大的文件的一部分,大约有 1000 行,旨在实现 hypercuts 算法):

   void get_combination_cuts_characteristics(uint32_t* cuts, 
   uint32_t nb_dim_cut, 
   struct hypercuts_dimension** dimensions,
   uint32_t* return_cuts, 
   uint32_t sum_cuts, 
   struct classifier_rule** rules,
   uint32_t nb_rules,
   uint32_t* children_rules_sum, 
   uint32_t* max_rules)
{
   // Array of children
   uint32_t nb_children = (uint32_t) 0x1 << sum_cuts;
   uint32_t* min_index = chkmalloc(sizeof(*min_index) * nb_dim_cut);
   uint32_t* max_index = chkmalloc(sizeof(*max_index) * nb_dim_cut);
   uint32_t* current_index = chkmalloc(sizeof(*current_index) * nb_dim_cut);
   uint32_t children_array[nb_children];
   for (uint32_t i = 0; i < nb_children; ++i)
      children_array[i] = 0;


   // For each rules we compute the number of rule each child get.
   uint32_t min_value;
   uint32_t max_value;
   uint32_t nb_cuts;
   uint32_t subregion_size;
   uint32_t index;
   for (uint32_t i = 0; i < nb_rules; ++i)
   {
      for (uint32_t j = 0; j < nb_dim_cut; ++j)
      {
         min_value = rules[i]->statements[dimensions[j]->id]->value;
         max_value = rules[i]->statements[dimensions[j]->id]->value | rules[i]->statements[dimensions[j]->id]->mask;
         nb_cuts = (uint32_t)0x1 << cuts[j];
         subregion_size = (dimensions[j]->max_dim - dimensions[j]->min_dim) + 1;
         subregion_size = subregion_size / nb_cuts;

         if(subregion_size == 0)
            continue;

         // Fit the interval in the region of the dimension
         if(min_value < dimensions[j]->min_dim)
            min_value = dimensions[j]->min_dim;

         if(max_value > dimensions[j]->max_dim)
            max_value = dimensions[j]->max_dim;

         // Compute the minimal and maximal index of the rule in this dimension
         min_index[j] = (min_value - dimensions[j]->min_dim) / subregion_size;
         max_index[j] = (max_value - dimensions[j]->min_dim) / subregion_size;
         current_index[j] = min_index[j];
      }

      // Locate the first child
      index = get_multi_dimension_index(min_index, nb_dim_cut, cuts);
      children_array[index] ++;

      // Locate all the other children that the rule span
      while(get_next_dimension_index(current_index, min_index, max_index, nb_dim_cut))
      {
         index = get_multi_dimension_index(current_index, nb_dim_cut, cuts);
         children_array[index]++;
      }
   }

   // Set the return variables
   uint32_t rules_sum = 0;
   uint32_t max_rule_child = 0;

   for (uint32_t i = 0; i < nb_children; ++i)
   {
      rules_sum += children_array[i];
      if(max_rule_child < children_array[i])
         max_rule_child = children_array[i];
   }

   if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
   {
      *max_rules = max_rule_child;
      *children_rules_sum = rules_sum;
      for (uint32_t i = 0; i < nb_dim_cut; ++i)
         return_cuts[i] = cuts[i];
   }

   free(min_index);
   free(max_index);
   free(current_index);
}

gdb 告诉我,我在 rules_sum += children_array[i]; 行遇到了段错误,所以我似乎在数组上走得太远了,我检查了我的代码。但问题是,当我打印单元格时,gdb 尝试访问它是好的(给我我期望的值)。然后我试图找出指针是否可能是原因,但它们在 gdb 中都打印得很好。我 运行 使用 valgrind 的程序,然后它在 if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum))) 行给我一个段错误。我还测试了这个语句的 variables/pointers 并且它们打印也很好。所以我想知道我是否会出现堆栈溢出,所以我为 valgrind 分配了一个 2GB 的堆栈,并在堆上分配了函数的数组,但它导致了同样的问题。

另一个棘手的事情是,如果我在 for 循环之前放一个 fprint,一个在循环之后,一个在里面,我 运行 没问题...

这是 valgrind 给我的:

Invalid read of size 4
==8397==    at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397==    by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397==    by 0x4025B2: build_node (hypercuts.c:219)
==8397==    by 0x40273D: build_node (hypercuts.c:285)
==8397==    by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397==    by 0x403B02: main (hypercuts_test.c:277)
==8397==  Address 0x11fefff77a is not stack'd, malloc'd or (recently) free'd
==8397== 
==8397== 
==8397== Process terminating with default action of signal 11 (SIGSEGV)
==8397==  Access not within mapped region at address 0x11FEFFF77A
==8397==    at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397==    by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397==    by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397==    by 0x4025B2: build_node (hypercuts.c:219)
==8397==    by 0x40273D: build_node (hypercuts.c:285)
==8397==    by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397==    by 0x403B02: main (hypercuts_test.c:277)

我没主意了,我来这里寻求帮助,可以给我提示或新想法。这个函数被另一个递归函数调用(build_node:我说的是在分段错误的情况下有 4 次递归调用,所以不会太多)并且在它出错之前执行了 3 次。这给我的感觉是堆栈(指针或数组)乱七八糟,但我没有找到分析堆栈的工具,我检查了那部分代码很多次。

提供有关该部分代码的一些细节:这打算对要在多维中执行的切割次数执行线性优化 space。这个特定的函数给出了所执行的切割的特征,并在每个优化步骤结束时执行。

提前致谢!!

这应该很容易调试。

我会首先调查 valgrind 崩溃,因为它往往更精确。 if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum))) 行有两个指针被取消引用。其中一个或多个肯定是垃圾。仔细检查 max_rules 和 children_rules_sum 是否指向有效地址。添加调试语句并查看值是否更改。

另一行rules_sum += children_array[i]; 似乎也可以。似乎没有检查 index 是否小于 nb_children。使用相同的策略并添加一些两个值的调试语句。超过数组末尾的写入将破坏堆栈。堆栈损坏可能会覆盖 children_rules_sum 或 max_rules,导致 valgrind 崩溃。