检测对范围外变量的访问

Detecting access to out-of-scope variables

Code like this 是未定义的行为,因为它访问的局部变量不再在范围内(其生命周期已结束)。

int main() {
    int *a;
    {
        int b = 42;
        a = &b;
    }
    printf("%d", *a); // UB!
    return 0;
}

我的问题:是否有自动检测此类错误的好方法?它似乎应该是可检测的(当变量超出范围时将堆栈的部分 space 标记为不可用,然后如果 space 被访问则抱怨),但是 Valgrind 3.10、Clang 4 的 AddressSanitizer 和 UndefinedBehaviorSanitizer,而 GCC 6 的 AddressSanitizer 和 UndefinedBehaviorSanitizer 都没有抱怨。

是的。 Lint 就是为此而设计的。我们在嵌入式系统和汽车系统中大量使用它。 You can use the online demo to test out how well it would work for you. 在您的具体情况下,它的规则 MISRA:2012:18.6.

样本运行


FlexeLint for C/C++ (Unix) Vers. 9.00L, Copyright Gimpel Software 1985-2014
--- Module: misra3.c (C)
        _
     1  int main() {
misra3.c  1  Note 970:  Use of modifier or type 'int' outside of a typedef [MISRA 2012 Directive 4.6, advisory]
misra3.c  1  Note 9075:  external symbol 'main(void)' defined without a prior declaration [MISRA 2012 Rule 8.4, required]
            _
     2      int *a;
misra3.c  2  Note 970:  Use of modifier or type 'int' outside of a typedef [MISRA 2012 Directive 4.6, advisory]
     3      {
                _
     4          int b = 42;
misra3.c  4  Note 970:  Use of modifier or type 'int' outside of a typedef [MISRA 2012 Directive 4.6, advisory]
                      _
     5          a = &b;
misra3.c  5  Info 733:  Assigning address of auto variable 'b' to outer scope symbol 'a' [MISRA 2012 Rule 18.6, required]
     6      }
            _
     7      printf("%d", *a); // UB!
misra3.c  7  Info 718:  Symbol 'printf' undeclared, assumed to return int [MISRA 2012 Rule 17.3, mandatory]
misra3.c  7  Warning 586:  function 'printf' is deprecated. [MISRA 2012 Rule 21.6, required]
misra3.c  7  Info 746:  call to function 'printf()' not made in the presence of a prototype
     8      return 0;
                    _
     9  }

misra3.c  9  Info 783:  Line does not end with new-line
misra3.c  9  Note 954:  Pointer variable 'a' (line 2) could be declared as pointing to const [MISRA 2012 Rule 8.13, advisory]

/// Start of Pass 2 ///

--- Module: misra3.c (C)
     1  int main() {
     2      int *a;
     3      {
     4          int b = 42;
     5          a = &b;
     6      }
     7      printf("%d", *a); // UB!
     8      return 0;
     9  }

--- Global Wrap-up

Warning 526:  Symbol 'printf()' (line 7, file misra3.c) not defined
Warning 628:  no argument information provided for function 'printf()' (line 7, file misra3.c)

如果没有特殊的编译器支持,非侵入式内存调试器(如 Valgrind)可以检测到对超出范围的堆栈帧的访问,但不能检测到函数内的范围。这是因为编译器(通常)allocate all the memory for a stack frame in a single pass*。因此,为了检测同一函数内对超出范围变量的访问,我们需要针对 "poison" 超出范围但其封闭框架仍然有效的变量进行特定的编译器检测。

ubsan使用的技术AddressSanitizer, available in recent versions of clang and gcc, is to replace stack access with access to specially allocated memory:

In order to implemented quarantine for the stack memory we need to promote stack to heap. [...] __asan_stack_malloc(real_stack, frame_size) allocates a fake frame (frame_size bytes) from a thread-local heap-like structure (fake stack). Every fake frame comes unpoisoned and then the redzones are poisoned in the instrumented function code. __asan_stack_free(fake_stack, real_stack, frame_size) poisons the entire fake frame and deallocates it.

用法和输出示例:

$ g++ -std=c++11 a.cpp -fsanitize=address && env ASAN_OPTIONS='detect_stack_use_after_return=1' ./a.out 
ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fd0e8300020 at pc 0x000000400c1b bp 0x7fff5b45ecf0 sp 0x7fff5b45ece8
READ of size 4 at 0x7fd0e8300020 thread T0
    #0 0x400c1a in main (a.out+0x400c1a)
    #1 0x7fd0ebe18d5c in __libc_start_main (/lib64/libc.so.6+0x1ed5c)
    #2 0x400a48  (a.out+0x400a48)

Address 0x7fd0e8300020 is located in stack of thread T0 at offset 32 in frame
    #0 0x400b26 in main (a.out+0x400b26)

  This frame has 1 object(s):
    [32, 36) 'b' <== Memory access at offset 32 is inside this variable

请注意,因为它很昂贵,所以必须在编译时 (-fsanitize=address) 和 运行 时 (ASAN_OPTIONS='detect_stack_use_after_return=1') 请求它。关于最低版本;它适用于 gcc 7.1.0 和 clang t运行k,但显然不适用于任何已发布的 clang 版本,因此如果您想使用已发布的编译器,则必须使用 gcc。


* 考虑到这两个函数编译(例如通过 -O0 处的 gcc)为相同的机器代码,因此非侵入式内存调试器无法**区分它们:

int f() {
    int* a;
    {
        int b = 42;
        a = &b;
    }
    return *a;
}

int g() {
    int* a;
    int b = 42;
    a = &b;
    return *a;
}

** 严格来说,如果调试符号可用,调试器可以跟踪进入和超出范围的变量。但一般来说,如果您有可用的调试符号,您就有了源代码,因此可以使用检测重新编译程序。