当 C 代码编译成机器代码时，栈上无缘无故地保留了 20 个字节

Question

我正在关注 Nick Blundell 在 OS development 的 5.1.3 部分。我正在研究如何将以下 C 代码编译成机器代码：

void caller_fun(){
        callee_fun(0xdede);
}

int callee_fun(int arg){
        return arg;
}

我最后被ndisasm反汇编的机器码是这样的：

00000000  55                push ebp
00000001  89E5              mov ebp,esp
00000003  83EC08            sub esp,byte +0x8
00000006  83EC0C            sub esp,byte +0xc
00000009  68DEDE0000        push dword 0xdede
0000000E  E806000000        call dword 0x19
00000013  83C410            add esp,byte +0x10
00000016  90                nop
00000017  C9                leave
00000018  C3                ret
00000019  55                push ebp
0000001A  89E5              mov ebp,esp
0000001C  8B4508            mov eax,[ebp+0x8]
0000001F  5D                pop ebp
00000020  C3                ret

研究堆栈指针和基指针的工作原理，我制作了下图，显示了当处理器运行偏移量 0x1C 处的操作码时的堆栈情况：

                 处理器为
             运行宁 `mov eax,[ebp+0x8]` 在偏移量 0x1C

    +--------------------------------+
    | | 4 个字节
    | `push ebp` 在偏移量 0x00 |
    +--------------------------------+
    | 20 (8+12) 个字节 |
    | `sub esp,byte +0x8` |
    |和 `sub esp,byte +0xc` |
    |在偏移量 0x03 和 0x06 |
    +--------------------------------+
    | `push dword 0xdede` 的 4 个字节 |
    |在偏移量 0x09 |
    +--------------------------------+
    | 4 个字节用于指令指针 |
    |通过`call dword 0x19` |
    |在偏移量 0x0E |
    +--------------------------------+
    | 4 个字节用于 `push ebp` |
    |在偏移量 0x19 |
    +--------------------------------+ --> ebp 和 esp 都在这里
                                               `mov ebp, esp`
                                              在偏移量 0x1A

现在，我有一些自己研究学习也想不通的问题：

我的堆栈情况图正确吗？
为什么 sub esp,byte +0x8 和 sub esp,byte +0xc 在偏移量 0x03 和 0x06 处将 20 个字节压入堆栈？
即使需要20个字节的堆栈内存，为什么不是由sub esp,byte +0x14这样的单个指令分配，即0x14=0x8+0xc

我正在使用此生成文件编译 C 代码：

all: call_fun.o call_fun.bin call_fun.dis

call_fun.o: call_fun.c
    gcc -ffreestanding -c call_fun.c -o call_fun.o

call_fun.bin: call_fun.o
    ld -o call_fun.bin -Ttext 0x0 --oformat binary call_fun.o

call_fun.dis: call_fun.bin
    ndisasm -b 32 call_fun.bin > call_fun.dis

Answer 1

如果不进行优化，堆栈将用于保存和恢复基指针。在 x86_64 调用约定 (https://en.wikipedia.org/wiki/X86_calling_conventions) 中，调用函数时堆栈必须按 16 字节边界对齐，因此您的情况很可能会发生这种情况。至少，当我在我的系统上编译您的代码时，这就是我的情况。这是为此的 ASM：

callee_fun(int): # @callee_fun(int)
  pushq %rbp
  movq %rsp, %rbp
  movl %edi, -4(%rbp)
  movl -4(%rbp), %eax
  popq %rbp
  retq
caller_fun(): # @caller_fun()
  pushq %rbp
  movq %rsp, %rbp
  subq , %rsp
  movl 054, %edi # imm = 0xDEDE
  callq callee_fun(int)
  movl %eax, -4(%rbp) # 4-byte Spill
  addq , %rsp
  popq %rbp
  retq

值得注意的是，当优化打开时，根本没有堆栈使用或修改：

callee_fun(int): # @callee_fun(int)
  movl %edi, %eax
  retq
caller_fun(): # @caller_fun()
  retq

最后但并非最不重要的一点是，在使用 ASM 列表时，不要反汇编目标文件或可执行文件。相反，指示您的编译器生成汇编列表。这会给你更多的上下文。

如果您使用的是 gcc，一个好的命令是

gcc -fverbose-asm -S -O

当 C 代码编译成机器代码时，栈上无缘无故地保留了 20 个字节

20 bytes are reserved on the stack for no apparent reason when C code is compiled into machine code

c

x86

assembly

nasm