在c中的二维数组上越界访问内部数组是否是未定义的行为

Question

我在 c 中玩弄一些数组和指针，开始怀疑这样做是否是未定义的行为。

int (*arr)[5] = malloc(sizeof(int[5][5]));

// Is this undefined behavior?
int val0 = arr[0][5];

// Rephrased, is it guaranteed it'll always have the same effect as this line?
int val1 = arr[1][0];

感谢您的任何见解。

Answer 1

在 C 中，您所做的是未定义的行为。

表达式 arr[0] 的类型为 int [5]。因此表达式 arr[0][5] 取消引用数组末尾后的一个元素 arr[0]，并且取消引用数组末尾后的元素是未定义的行为。

C standard 的第 6.5.2.1p2 节关于数组下标状态：

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

关于加法运算符的 C 标准第 6.5.6p8 节指出：

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently,N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object,the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

粗体部分指定数组下标中隐含的加法可能不会导致指针超出数组末尾一个元素，并且指向数组末尾后一个元素的指针可能不会被推迟。

事实上，所讨论的数组本身就是数组的一个成员，这意味着每个子数组的元素在内存中都是连续的，这并没有改变这一点。编译器中的积极优化设置可能会注意到访问数组末尾并根据这一事实进行优化是未定义的行为。

Answer 2

该标准显然旨在避免要求编译器给出如下内容：

int foo[5][10];
int test(int i)
{
  foo[1][0] = 1;
  foo[0][i] = 2;
  return foo[1][0];
}

必须重新加载 foo[1][0] 的值，以适应写入 foo[0][i] 可能影响 foo[1][0] 的可能性。另一方面，在编写标准之前，写这样的东西是惯用的：

void dump_array(int *p, int rows, int cols)
{
  int i,j;
  for (i=0; i<rows; i++)
  {
    for (j=0; j<cols; j++)
      printf("%6d", *p++);
    printf("\n");
  }
}
int foo[5][10];
...
  dump_array(foo[0], 5, 10);

而且已发表的基本原理中没有任何内容表明作者有任何禁止此类构造或破坏使用它们的代码的意图。事实上，要求数组的行连续放置的主要好处是允许这样的代码运行，即使添加填充可以提高效率。

在编写标准时，当为接收指针的函数生成代码时，编译器会将指针视为可以标识某个任意较大对象的任意部分，而无需努力了解或关心封闭的对象可能是什么。因此，作为一种非常流行的“一致语言扩展”形式，它们会支持 dump_array 这样的结构，而不管标准是否要求它们这样做，因此标准的作者认为没有理由担心什么时候该标准规定了这种支持。相反，他们留下了标准可以放弃管辖权的实施质量问题。

不幸的是，由于该标准的作者预计编译器会将传递指向函数的指针的行为视为隐式“清洗”它，因此该标准的作者认为没有必要为清洗信息定义任何显式方法在函数需要处理标识“原始”存储的指针的情况下，关于指针的封闭对象。考虑到 1980 年代编译器技术的状态，这些区别并不重要，但如果例如，可能非常相关。代码做类似的事情：

int matrix[10][10];
void test2(int c)
{
  matrix[4][0] = 1;
  dump_array(matrix[0], 1, c);
  matrix[4][0] = 2;
}

或

void test3(int r)
{
  matrix[4][0] = 1;
  dump_array((int*)matrix, r, 10);
  matrix[4][0] = 2;
}

根据函数的意图，让编译器优化其中一个或两个中对 matrix[4][0] 的第一次写入可能会提高效率，或者可能导致生成的代码无用。将显式指针转换视为擦除类型信息，但将数组到指针的衰减视为保留它，如果程序员像第二个示例中那样编写代码，将允许他们实现所需的语义，同时允许编译器在源代码被替换时执行相关优化。写成第一个例子。不幸的是，标准没有做出任何区分，自由编译器的维护者不愿意放弃他们认为标准给予他们的任何“优化”，除了避免交叉的实现之外，语言除了“希望最好”的语义外什么都没有-程序优化或记录需要做什么来阻止它们。

在c中的二维数组上越界访问内部数组是否是未定义的行为

Is it undefined behavior to access the inner array out of bounds on a 2d array in c

c

pointers

multidimensional-array

undefined-behavior

language-lawyer