汇编代码为 return 数组中的最小整数，而不是随机 returns 最后一个或倒数第二个数字

Question

我正在尝试在 nasm 中创建一个函数，给定一个整数数组和数组的长度，returns 是最小的整数。这是基于 CodeWars 问题 "Find the smallest integer in the array"。我在 64 位 BlackArch Linux 上执行此操作。我的函数如下所示：

SECTION .text
global find_smallest_int

find_smallest_int:
  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

  ; rsi is the second argument to int find_smallest_int(int *, int)
  ; which represents the length of the array.
  ; Store it in rbx to be explicit.
  mov rbx, rsi

  loop:
    ; Check to see if we've reached the end of the array.
    ; If we have, we jump to the end of the function and 
    ; return the smallest value (which should be whatever
    ; is in rax at the moment.
    cmp rbx, 0
    je end

    ; Subtract one from our counter. This started as 
    ; the number of elements in the array - when it
    ; gets to 0, we'll have looped through the entire thing.
    sub rbx, 1

    ; If rax is smaller than [rdi], we'll jump down to the
    ; rest of the loop. Only if rax is bigger than [rdi] will
    ; we reassign rax to be the new smallest-yet vaue.
    cmp rax, [rdi]
    jl postassign

    assign:
      ; If we execute this code, it means rax was not less
      ; than [rdi]. Therefore, we can safely reassign
      ; rax to [rdi].
      mov rax, [rdi]


    postassign:
    ; Set rdi to point to the next value in the array
    add rdi, 4

    ; if we get here, then we aren't finishing looping yet
    ; because rbx (the counter) hasn't eached 0 yet.
    jmp loop

  end:
    ret

然后我通过以下 C 代码调用此函数：

extern int find_smallest_int(int *array, int size);

int main(void)
{
    int nums[4] = {800, 300, 100, 11};
    int ret = find_smallest_int(nums, 4);

    return ret;
}

最后，我使用以下命令编译并运行整个过程：

#!/bin/bash

# Make an object file from my assembly code with nasm
nasm -f elf64 -o sum.o call_sum.s

# make an object file from my C code
gcc -O0 -m64 -c -o call_sum.o call_sum.c -g

# compile my two object files into an executable
gcc -O0 -m64 -o run sum.o call_sum.o -g

# Run the executable and get the output in the
# form of the exit code.
./run
echo $?

我没有得到最小的整数，而是得到了 100 或 11（分别是我传递给汇编函数的整数数组的倒数第二个和最后一个成员）。我得到的结果似乎是完全随机的。我可以运行程序几次并获得 11，然后运行再执行几次然后开始获得 100。

如果有人能帮助我理解这种奇怪的行为，我将不胜感激。谢谢！

更新： 我实现了 Jester 评论中的更改（使用 32 位寄存器来保存整数）并且它有效，但我不太明白为什么。

Answer 1

本回答的开头是基于杰斯特的评论。它只是对此进行了扩展，并更详细地解释了这些变化。我也做了一些额外的更改，其中两个还解决了您的来源中的错误。

首先这部分：

An int is 4 bytes but you use 8 all over your code. Use eax instead of rax.

您示例中的这些指令分别从数组中访问 8 个字节：

    mov rax, [rdi]

    cmp rax, [rdi]

    mov rax, [rdi]

这是因为 rax 是一个 64 位寄存器，所以执行完整的 rax 加载或与内存操作数比较会访问 8 字节的内存。在 NASM 语法中，您可以明确指定内存操作数的大小，例如通过编写以下内容：

    mov rax, qword [rdi]

如果您这样做了，您可能会更早地发现您正在以 8 字节为单位（四字）访问内存。在使用 rax 作为目标寄存器时尝试显式访问双字将失败。以下行在汇编时导致错误 "mismatch in operand sizes"：

    mov rax, dword [rdi]

下面两行没问题，都从双字内存操作数加载到 rax。第一个使用零扩展（在写入 32 位寄存器部分时隐含在 AMD64 指令集中），第二个使用（显式）符号扩展：

    mov eax, dword [rdi]
    movsx rax, dword [rdi]

（从双字内存操作数到 rax 的 movzx 指令不存在，因为它与 mov 到 eax 是多余的。）

稍后在您的示例中，您使用 rdi 作为 4 字节宽类型的地址，通过向其添加 4 来推进数组条目指针：

    add rdi, 4

这对于 int 类型是正确的，但与您使用四字作为内存操作数的大小冲突。

杰斯特的评论又给出了两个问题：

Also do not use rbx because that is a callee-saved register and it's pointless to copy from rsi anyway. As before, you'd better use esi because that's another int.

rsi 问题是 64 位 rsi 的高 32 位可能包含非零值，具体取决于 ABI。如果您不确定是否允许非零值，您应该假设它是并且您应该仅在 esi 中使用 32 位值。

rbx（或ebx）问题是 rbx 需要在 Linux 使用的 AMD64 psABI 的函数调用中保留，参见 Where is the x86-64 System V ABI documented? 获取该 ABI 的文档。在您的简单测试程序中，更改 rbx 可能不会导致任何失败，但在非平凡的上下文中很容易导致失败。

我发现的下一个问题是 eax 的初始化。你是这样写的：

  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

但是，正如您的循环流控制逻辑所证明的那样，您允许调用者为大小参数传递一个零。在那种情况下，您根本不应该访问该数组，因为 "the first value in the array" 甚至可能不存在或根本没有被初始化为任何东西。从逻辑上讲，您应该使用 INT_MAX 而不是第一个数组条目来初始化最小的值。

还有一个问题：您正在使用 rsi 或 esi 作为无符号数，倒数到零。但是，在您的函数声明中，您将 size 参数的类型指定为 int，它是有符号的。我通过将声明更改为 unsigned int.

来解决这个问题

我对您的程序做了一些可选的更改。我为函数的 "sub" 标签使用了 NASM 局部标签，这很有用，因为您可以在相同的其他函数中重复使用例如 .loop 或 .end源文件，如果有任何要添加的。

我还更正了其中一条评论，指出我们在 eax 小于数组条目时跳转，而不在 eax 大于 或等于 时跳转到数组入口。您可以将此条件跳转更改为 jle 而不是它也会跳转进行相等比较。可以说，为了清晰度或性能，可能更喜欢其中一个，但我没有太多答案。

我还使用了 dec esi 而不是 sub esi, 1，这并没有好多少，但更适合我。在 32 位模式下，dec esi 是一条单字节指令。但在 64 位模式下并非如此； dec esi 是 2 个字节，而 sub esi, 1 是 3 个字节。

另外，我把初始检查esi为零的方法从cmp改为test，稍微好一点，参考

最后，我将实际的循环条件更改为在循环体的末尾，这意味着循环使用了一个跳转指令。无条件跳转到循环体的开始被替换为检查 while 条件的条件跳转。函数开头的 test 仍然需要处理零长度数组的可能性。此外，我没有再次使用 cmp 或 test 来检查 esi 中的零，而是使用已经由 dec 指令设置的零标志来检查是否 esi 被递减为零。

您可以使用 ecx 或 rcx 作为循环计数器，但这在现代 CPU 上可能不会有多大好处。如果您使用 jrcxz、jecxz 或 loop 指令，将允许稍微更紧凑的代码。但不推荐使用它们，因为性能较慢。

不是与 dword [rdi] 进行比较，如果小于或等于 eax，则从同一内存双字加载，您可以先将数组条目的值加载到寄存器中，然后然后将其用作 cmp 和 mov 的来源。这可能会更快，但它确实会导致更多的操作码字节。

我用来将目标索引（rdi 在 64 位模式下）推进 4 的一个技巧是使用单个 scasd 指令，它只修改标志和索引寄存器.这是一条单字节指令，而不是 4 字节 add rdi, 4，但很可能比运行.

慢

我上传了一个 repo，其中包含您的原始来源和我对 https://hg.ulukai.org/ecm/testsmal/file/2b8637ca416a/ 的改进（在 CC BY-SA 使用条件下 Whosebug 内容。）我也修改了 C 部分和测试脚本，但是这些微不足道，而且与您的问题几乎无关。这是汇编源代码：


INT_MAX equ 7FFF_FFFFh

SECTION .text
global find_smallest_int

find_smallest_int:

                ; If the array is empty (size = 0) then we want to return
                ; without reading from the array at all. The value to return
                ; then logically should be the highest possible number for a
                ; 32-bit signed integer. This is called INT_MAX in the C
                ; header limits.h and for 32-bit int is equal to 7FFF_FFFFh.
                ;
                ; If the array is not empty, the first iteration will then
                ; always leave our result register equal to the value in
                ; the first array entry. This is either equal to INT_MAX
                ; again, or less than that.
        mov eax, INT_MAX

                ; esi is the second argument to our function, which is
                ; declared as int find_smallest_int(int *, unsigned int).
                ; It represents the length of the array. We use this
                ; as a counter. rsi (and its part esi) need not be preserved
                ; across function calls for the AMD64 psABI that is used by
                ; Linux, see 

                ; Check for an initial zero value in esi. If this is found,
                ; skip the loop without any iteration (while x do y) and
                ; return eax as initialised to INT_MAX at the start.
        test esi, esi
        jz .end

.loop:
                ; If eax is smaller than dword [rdi], we'll jump down to the
                ; rest of the loop. Only if eax is bigger than or equal to
                ; the dword [rdi] will we reassign eax to that, to hold the
                ; new smallest-yet value.
        cmp eax, dword [rdi]
        jl .postassign

.assign:
                ; If we execute this code, it means eax was not less
                ; than dword [rdi]. Therefore, we can safely reassign
                ; eax to dword [rdi].
        mov eax, dword [rdi]


.postassign:
                ; Set rdi to point to the next value in the array.
        add rdi, 4


                ; Subtract one from our counter. This started as 
                ; the number of elements in the array - when it
                ; gets to 0, we'll have looped through the entire thing.
        dec esi

                ; Check to see if we've reached the end of the array.
                ; To do this, we use the Zero Flag as set by the prior
                ; dec instruction. If esi has reached zero yet (ZR) then
                ; we do not continue looping. In that case, we return the
                ; smallest value found yet (which is in eax at the moment).
                ;
                ; Else, we jump to the start of the loop to begin the next
                ; iteration.
        jnz .loop

.end:
        retn

这是循环体内条件跳转的替代方法。 cmov 指令似乎被所有 AMD64 CPU 支持。这是一个有条件的移动：如果满足条件，它运行就像 mov ——否则它没有任何效果，有一个例外：它可能读取源操作数（因此可能会从内存访问）。您可以翻转 mov 周围用于分支的条件以获得 cmov 指令的条件。（我遇到了 this thread involving Linus Torvalds，这表明条件跳转解决方案实际上可能比 cmov 更好或不差。随心所欲。）

.loop:
                ; If eax is greater than or equal to dword [rdi], we'll
                ; reassign eax to that dword, the new smallest-yet value.
        cmp eax, dword [rdi]
        cmovge eax, dword [rdi]

                ; Set rdi to point to the next value in the array.
        add rdi, 4

汇编代码为 return 数组中的最小整数，而不是随机 returns 最后一个或倒数第二个数字

Assembly code to return smallest integer in array instead randomly returns either last or second to last number

x86

assembly

x86-64

nasm