未初始化错误的编写和使用 Valgrind

Question

基本上，我编写了一个程序，给定一组输出，该程序计算一个公式，通过遗传编程给出这些输出。在程序中，我有一个函数，在给定一组样本（健身数据和目标数据）的情况下，将一组数据（输入和输出）随机拆分为训练数据和测试数据。该函数的工作方式是将数据分成四个单独的数组，training_cases、test_cases、training_targets 和 test_targets。 Training_cases 和 test_cases 是包含输入的双数组，而 training_targets 和 test_targets 是包含输出的单数组。这是函数：

struct csv_data *get_test_and_train_data(char *file_name, double split) {
    double ***exemplars = parse_exemplars(file_name);
    double **fitness = exemplars[0];
    double *targs = *exemplars[1];

    // Get lengths of the arrays.
    int fitness_len = get_2d_arr_length(fitness);
    int targs_len =  get_double_arr_length(targs);
    int col_size = get_double_arr_length(fitness[0]);

    // randomize the index order
    int fits_split_i = (int)(floor(fitness_len * split));
    int *fits_rand_idxs = random_indexes(fitness_len);

    // Split the cases and targets up according to the index at which to split.
    // Leave space for NULL/NAN at the end.
    double **training_cases = malloc((sizeof(double *) * fits_split_i) + 1);
    double **test_cases = malloc((sizeof(double *) * (fitness_len - fits_split_i)) + 1);
    double *training_targets = malloc((sizeof(double) * fits_split_i) + 1);
    double *test_targets = malloc(sizeof(double) * (targs_len - fits_split_i) + 1);

    // Allocate the inner arrays.
    for (int i = 0; i < fits_split_i; i++) {
        training_cases[i] = malloc(sizeof(double) * col_size);

        if (i >= fitness_len) {
            test_cases[i - fits_split_i] = malloc(sizeof(double) * col_size);
        }
    }

    int rand_i;

    // Split the fitness and target data into training and test cases.
    for (int i = 0; i < fitness_len; i++) {
        rand_i = fits_rand_idxs[i];

        if (i >= fits_split_i) {
            test_cases[i - fits_split_i] = fitness[rand_i];
            test_targets[i - fits_split_i] = targs[rand_i]; // line 636
        } else {
            training_cases[i] = fitness[rand_i];
            training_targets[i] = targs[rand_i]; // line 639

        }
    }

    // Set last index to NULL/NAN to allow for easier looping of arrays
    training_cases[fits_split_i] = NULL; // line 645
    test_cases[fitness_len - fits_split_i] = NULL; // line 646
    training_targets[fits_split_i] = NAN; // line 647
    test_targets[targs_len - fits_split_i] = NAN; // line 648

问题是我收到多个错误（写入和未初始化值错误）。这是 valgrind 的输出：

==5049== Use of uninitialised value of size 8
==5049==    at 0x4053A4: get_test_and_train_data (util.c:639)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Uninitialised value was created by a stack allocation
==5049==    at 0x405161: get_test_and_train_data (util.c:599)
==5049== 
==5049== Use of uninitialised value of size 8
==5049==    at 0x405343: get_test_and_train_data (util.c:636)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Uninitialised value was created by a stack allocation
==5049==    at 0x405161: get_test_and_train_data (util.c:599)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x4053D0: get_test_and_train_data (util.c:645)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5593eb0 is 672 bytes inside a block of size 673 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x4051F1: get_test_and_train_data (util.c:614)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x4053EE: get_test_and_train_data (util.c:646)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594028 is 296 bytes inside a block of size 297 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x40520D: get_test_and_train_data (util.c:615)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x405411: get_test_and_train_data (util.c:647)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594310 is 672 bytes inside a block of size 673 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x405226: get_test_and_train_data (util.c:616)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x405434: get_test_and_train_data (util.c:648)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594488 is 296 bytes inside a block of size 297 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x405242: get_test_and_train_data (util.c:617)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==

我猜测这些错误中的大部分是由于函数开头的分配不当造成的。我已经测试了所有其他用于确保它们返回正确值的函数。

如有任何帮助，我们将不胜感激。

编辑 1

Christoph Freundl 消除了所有写入错误，所以现在我要修复未初始化的错误。我感觉是 parse_exemplars() 造成的，所以这里是 parse_exemplars:

/**
* Parse a CSV file. Parse the fitness case and split the data into
* test and train data. in the fitness case file each row is an exemplar
* and each dimension is in a column. The last column is the target value
* of the exemplar. The function returns a third degree pointer with the
* fitness data as the first element and the targets as the second element.
* The fitness data is structured as a 2D array and the target data is
* represented as a one dimensional array.
*    file_name: Name of CSV file with a header.
*/
double ***parse_exemplars(char *file_name) {
    csv_reader *reader = init_csv(file_name, ',');

    double **fitness_cases, *targets;
    int num_columns = get_num_column(reader);
    int num_lines = get_num_lines(reader);

    // leave space for NULL
    fitness_cases = malloc(sizeof(double *) * num_lines);

    for (int i = 0; i < num_lines; i++) {
        fitness_cases[i] = malloc(sizeof(double) * num_columns);
    }

    // leave space for NAN
    targets = malloc(sizeof(double) * (num_lines));

    csv_line *row;
    int f_i = 0;
    int t_i = 0;

    // Ignore the header
    next_line(reader);

    // Loop through to get target and fitness values.
    while ((row = readline(reader))) {
        int i;
        for (i = 0; i < num_columns; i++) {
            if (i == num_columns - 1) { // Last element of array is the target/desired output.
                targets[t_i++] = atof(row->content[i]);
            }
            else {
                // The arguments/inputs.
                fitness_cases[f_i][i] = atof(row->content[i]);
            }
        }

        // take the [i-1]th index because fitness cases has [num_columns-1] elements.
        fitness_cases[f_i][i-1] = (double)NAN;
        f_i++;
    }

    // Set last index to NULL/NAN for easier looping.
    fitness_cases[f_i] = NULL;
    targets[t_i] = (double)NAN;

    // Wrap the fitness cases and targets in a 3rd degree pointer
    double ***results = malloc(sizeof(double **) * 2);
    double *tmp[] = { targets };
    results[0] = fitness_cases;
    results[1] = tmp;

    free(row);
    free(reader);

    return results;
}

Answer 1

无效写入

对于数组 training_cases、test_cases、training_targets 和 test_targets 的最后一个元素只分配了一个字节。但是，这些访问可以作为 double（8 字节）或作为 double *（由于 64 位架构，同样是 8 字节）：在第 645-648 行的赋值中，NULL 和 NAN 值被隐式转换。因此，这些分配会导致 "invalid write" 错误。

更改分配，例如training_cases 至

double **training_cases = malloc((sizeof(double *) * (fits_split_i + 1));

和其他类似的分配，你应该没问题。

未初始化的值

parse_examplars()中有错误：results[1]接收到本地声明的数组tmp，该数组在离开函数时变得无效。

我的建议：定义一个

struct exemplars {
    double** fitness_cases;
    double* targets;
}

并使用这种类型的变量代替 double ***results。

未初始化错误的编写和使用 Valgrind

Write and use of uninitialized errors Valgrind

c

valgrind

memory-management

无效写入

未初始化的值