通过将所有寄存器名称从 eXX 更改为 rXX 来从 32 位移植到 64 位使得阶乘 return 0？

Question

对于所有学习计算机编程艺术的人来说，能够访问像 Stack Overflow 这样的社区是多么幸运啊！我已经决定开始学习如何编写计算机程序，我这样做是通过一本名为 'Programming From the Ground Up' 的电子书的知识，它教 reader 如何在GNU/Linux 环境中的汇编语言。

我在这本书中的进步已经到了创建一个程序的地步，该程序使用一个函数计算整数 4 的阶乘，我已经完成并完成了没有任何由 GCC 汇编程序或运行程序。但是，我程序中的函数没有 return 正确答案！ 4 的阶乘为 24，但程序 return 的值为 0！说实在的，我也不知道这是为什么！

这是供您考虑的代码：

.section .data

.section .text

.globl _start

.globl factorial

_start:

push                     #this is the function argument
call factorial             #the function is called
add , %rsp               #the stack is restored to its original 
                           #state before the function was called
mov %rax, %rbx             #this instruction will move the result 
                           #computed by the function into the rbx 
                           #register and will serve as the return 
                           #value 
mov , %rax               #1 must be placed inside this register for 
                           #the exit system call
int [=11=]x80                  #exit interrupt

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #saves the base-pointer
mov %rsp, %rbp             #moves the stack-pointer into the base-
                           #pointer register so that data in the stack 
                           #can be referenced as indexes of the base-
                           #pointer
mov , %rax               #the rax register will contain the product 
                           #of the factorial
mov 8(%rbp), %rcx          #moves the function argument into %rcx
start_loop:                #the process loop begins
cmp , %rcx               #this is the exit condition for the loop
je loop_exit               #if the value in %rcx reaches 1, exit loop
imul %rcx, %rax            #multiply the current integer of the 
                           #factorial by the value stored in %rax
dec %rcx                   #reduce the factorial integer by 1
jmp start_loop             #unconditional jump to the start of loop
loop_exit:                 #the loop exit begins
mov %rbp, %rsp             #restore the stack-pointer
pop %rbp                   #remove the saved base-pointer from stack
ret                        #return

Answer 1

TL:DR: return 地址的阶乘溢出 %rax，留下 0， 因为你移植错误。

将 32 位代码移植到 64 位并不像更改所有寄存器名称那么简单。 这可能会变成 assemble，但如您所见即使是这个简单的程序也有不同的行为。在 x86-64 中，push %reg 和 call 都推送 64 位值，并将 rsp 修改 8。如果您使用调试器单步执行代码，就会看到这一点。（有关使用 gdb 汇编的信息，请参阅 x86 tag wiki 的底部。）

您正在阅读一本使用 32 位示例的书，因此您可能应该只是 build them as 32-bit executables 而不是尝试在您之前将它们移植到 64 位知道怎么做。

您的 sys_exit() 使用 32 位 int 0x80 ABI 仍然有效（), but you will run into trouble with system calls if you try to pass 64-bit pointers. Use the 64-bit ABI。

如果您想调用任何库函数，您也会运行遇到问题，因为标准的函数调用约定也不同。请参阅 , and the 64-bit ABI link, and other calling-convention docs in the x86 标签 wiki。

但是你没有做任何这些，所以你的程序的问题简单地归结为没有考虑 x86-64 中的双倍 "stack width"。 您的 factorial 函数读取 return 地址作为其参数。

这是您的代码，已注释以解释其实际作用

push                     # rsp-=8.  (rsp) = qword 4
                           # non-standard calling convention with args on the stack.
call factorial             # rsp-=8.  (rsp) = return address.  RIP=factorial
add , %rsp               # misalign the stack, so it's pointing to the top half of the 4 you pushed earlier.
# if this was in a function that wanted to return, you'd be screwed.

mov %rax, %rbx             # copy return value to first arg of system call
mov , %rax               #eax = __NR_EXIT from asm/unistd_32.h, wasting 2 bytes vs. mov , %eax
int [=10=]x80                  # 32-bit ABI system call, eax=call number, ebx=first arg.  sys_exit(factorial(4))

所以调用者没有问题（对于您发明的非标准 64 位调用约定，它传递堆栈上的所有参数）。您也可以完全省略 add 到 %rsp，因为您即将退出而不进一步触及堆栈。

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #rsp-=8, (rsp) = rbp
mov %rsp, %rbp             # make a traditional stack frame

mov , %rax               #retval = 1.  (Wasting 2 bytes vs. the exactly equivalent mov , %eax)

mov 8(%rbp), %rcx          #load the return address into %rcx

... and calculate the factorial

对于静态可执行文件（和动态链接的可执行文件 that aren't ASLR enabled with PIE），_start 通常为 0x4000c0。您的程序仍然会运行在现代 CPU 上几乎是瞬时的，因为 0x4000c0 * imul 的 3c 延迟仍然只有 1250 万个核心时钟周期。在 4GHz CPU 上，这是 3 毫秒的 CPU 时间。

如果您在最近的发行版中通过与 gcc foo.o 链接创建了一个与位置无关的可执行文件，则 _start 将具有类似于 0x5555555545a0 的地址，并且您的函数将采用~70368 秒到运行在 4GHz CPU 上具有 3 周期 imul 延迟。

4194496！包括 many 个偶数，因此它的二进制表示有 many 个尾随零。当您乘以从 0x4000c0 到 1 的每个数字时，整个 %rax 将为零。

一个Linux进程的退出状态只是你传递给sys_exit()的整数的低8位（因为wstatus只是一个32位的int并且包括其他东西，比如结束进程的信号。参见 wait4(2))。所以即使使用小参数，也不需要太多。

通过将所有寄存器名称从 eXX 更改为 rXX 来从 32 位移植到 64 位使得阶乘 return 0？

Porting from 32 to 64-bit by just changing all the register names from eXX to rXX makes factorial return 0?

assembly

x86-64

function

32bit-64bit