EBX 寄存器用于内存访问的这种模式是什么？

Question

我正在学习逆向工程的基础知识。在逆向 crackme 时，我碰巧在几乎每个函数的开头看到了这个模式：

pushl %ebp                            
movl  %esp, %ebp              
pushl %ebx              # because ebx is a callee-saved register
subl  [=10=]x14,%esp        # of course [=10=]x14 changes depending on the function
calll 0x08048766
addl  [=10=]x1a5f, %ebx     # also this value sometime changes depending on the function

在 0x08048766 处有一个函数可以做到这一点：

movl 0(%esp), %ebx         
retl

所以基本上，正常情况下，每个函数首先初始化寄存器 ebp 和 esp。然后寄存器 ebx 被压入堆栈，这也是完全可以理解的，因为 ebx 是一个被调用者保存的寄存器，稍后在函数中使用它来引用一些静态数据（来自 .rodata），例如：

leal  -0x17b7(%ebx), %eax
movl  %eax, 0(%esp) 
calll printf

现在最有趣的（对我来说也是晦涩的）部分：如果我理解正确，ebx 首先用 esp 指向的值初始化（这使用 [=13 处的函数=]），为什么？里面有什么？不是入栈的未初始化点吗？

然后将另一个值添加到 ebx。这个值代表什么？

我想更好地了解在这种情况下如何使用寄存器 ebx，以及如何计算它指向的地址。

您可以查看完整的程序here，但遗憾的是没有任何 C 源代码可用。

Answer 1

此代码似乎是用 -fPIC 编译的。 PIC 代表 "position-independent code"，这意味着它可以加载到任何地址并且仍然能够访问它的全局变量。

在这种情况下ebx被称为PIC寄存器，它用来指向GOT的末尾（全局偏移量table）。 GOT 有偏移量（从程序的基地址*）到每个正在使用的全局变量。

很多时候，了解这类事情的最好方法是自己编译一些代码，然后查看输出。当您有要查看的符号时，它特别容易。

我们来做个实验：

pic.c

int global;

int main(void)
{
    global = 4;
    return 0;
}

编译

$ gcc -v
...
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)

$ gcc -m32 -Wall -Werror -fPIC -o pic pic.c

章节（缩写）

$ readelf -S pic
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [13] .text             PROGBITS        080482f0 0002f0 000182 00  AX  0   0 16
  [15] .rodata           PROGBITS        08048488 000488 00000c 00   A  0   0  4
  [22] .got              PROGBITS        08049ffc 000ffc 000004 04  WA  0   0  4
  [23] .got.plt          PROGBITS        0804a000 001000 000014 04  WA  0   0  4
  [24] .data             PROGBITS        0804a014 001014 000004 00  WA  0   0  1
  [25] .bss              NOBITS          0804a018 001018 000008 00  WA  0   0  4

Disassemble（Intel 语法，因为 AT&T 让我抓狂）

$ objdump -Mintel -d --no-show-raw-insn pic

080483eb <main>:
 80483eb:   push   ebp
 80483ec:   mov    ebp,esp
 80483ee:   call   804840b <__x86.get_pc_thunk.ax> ; EAX = EIP + 5
 80483f3:   add    eax,0x1c0d            ; EAX = 0x804a000 (.got.plt, end of .got)
 80483f8:   lea    eax,[eax+0x1c]        ; EAX = 0x804a01C (.bss + 4)

 80483fe:   mov    DWORD PTR [eax],0x4   ; set `global` to 4
 8048404:   mov    eax,0x0
 8048409:   pop    ebp
 804840a:   ret    

0804840b <__x86.get_pc_thunk.ax>:
 804840b:   mov    eax,DWORD PTR [esp]
 804840e:   ret    
 804840f:   nop

说明

在这种情况下，我的 GCC 决定使用 eax 作为 PIC 寄存器而不是 ebx。

另外，请注意编译器 (GCC 5.3.1) 在这里做了一些有趣的事情。它不是通过 GOT 访问变量，而是将 GOT 用作 "anchor"，而是直接偏移到 .bss 部分中的变量。

返回您的代码：

pushl %ebp                            
movl %esp, %ebp              
pushl %ebx             ; because ebx is a callee-saved register
subl [=14=]x14,%esp        ; end of typical prologue 

calll 0x08048766       ; __i686_get_pc_thunk_bx
                       ; Gets the current value of EIP after this call into EBX.
                       ; There is no other way to do this in x86 without a call

addl [=14=]x1a5f, %ebx     ; Add the displacement to the end of the GOT.
                       ; This displacement of course changes depending on 
                       ; where the function is.
                       ; EBX now points to the end of the GOT.

leal -0x17b7(%ebx), %eax    ; EAX = EBX - 0x17b7
movl %eax, 0(%esp)          ; Put EAX on stack (arg 0 to printf)
                            ; EAX should point to some string
calll printf

在您的代码中，它实际上也没有 "use" GOT（否则我们会看到第二次内存取消引用）；它使用它作为字符串的锚点，可能在只读数据部分 (.rodata) 中，该部分也出现在 GOT 之前。

如果您查看 0x08048766 处的函数，您会发现它看起来像这样：

mov    (%esp),%eax  ; Put return address (pushed onto stack by call insn)
                    ; in eax
ret                 ; Return

EBX 寄存器用于内存访问的这种模式是什么？

What is this pattern where the EBX register is used for memory access?

x86

reverse-engineering

cpu-registers