没有 32 位寄存器的 32 位/16 位有符号整数除法？

Question

我正在尝试将 32 位带符号整数除以 16 位带符号整数以获得带符号的 32 位商和 16 位余数。

我的目标是没有 fpu 的 286。

我过去已经写过代码来进行 32 位无符号除法：

DIV32 PROC

;DIVIDES A 32-BIT VALUE BY A 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE 16-BIT DIVISOR IN BX

;RETURNS THE 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    xor dx, dx ;dx:ax -> 0:MSW  
    div bx     ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    div bx     ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

DIV32 ENDP

到目前为止，我已经尝试做显而易见的事情，只是修改现有代码，将 XOR DX, DX 替换为 CWD，将 DIV 替换为 IDIV :

IDIV32 PROC

;DIVIDES A SIGNED 32-BIT VALUE BY A SIGNED 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE SIGNED 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE SIGNED 16-BIT DIVISOR IN BX

;RETURNS THE SIGNED 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    cwd        ;dx:ax -> 0:MSW, or ffff:MSW  
    idiv bx    ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    idiv bx    ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

IDIV32 ENDP

这适用于某些计算，例如 -654,328/2=-327164 (0xfff60408/2=fffb0204)。但它不适用于某些输入，-131,076/2 returns 是 -2 余数 0 的错误结果。1 或 -1 的除数似乎会导致除法错误，而与被除数无关。

我测试了许多不同的正负股息和除数，试图找到某种正确和错误结果的模式，我注意到它不能正确 return 结果-65538。

我有一种预感，我应该根据输入有条件地使用 CWD，但似乎 XOR DX, DX return 的错误结果更频繁。当除数和被除数均为正且商小于 0x7fffffff 时，两者都有效。

Answer 1

你的算法不能简单地更改为签名。

我们以计算(+1)/(-1)为例：

(+1)/(-1) = (-1)，余数0

在算法的第一步中，您将高位除以除数：

(+1)的高位为0，所以你计算的是：

0/(-1) = 0，余数 0

但是整个32位除法正确的高位是0FFFFh，不是0。而且你需要的第二除法的提示也是0FFFFh，不是0。

Oh, so the second IDIV should be a DIV. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working.

第一个除法已经不是你想要的结果了。所以改二师也无济于事...

A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

只有当股息的第 15 位被设置并且：

...除以 1 或
...除以 -1，至少设置了被除数的低 15 位之一

在这些情况下，您要划分：

... 000008000h...00000FFFFh 范围内的数字乘以 1
结果将在 +08000h...+0FFFFh
... 000008001h...00000FFFFh 范围内的数字 -1
结果将在 -0FFFFh...-08001h

...但是，结果是一个带符号的 16 位值，因此必须在 -8000h...+7FFFh 范围内。

我刚刚在虚拟机中尝试了 12345678h/(+1) 和 12345678h/(-1) 运行 DOS:

12345678h 的第 15 位未设置；两次我都没有得到除法错误。（但是除以-1结果是错误的！）

Answer 2

我不知道有什么算法可以将一个大的负数拆分成多个部分并为 IDIV 做准备。我会计算被除数和除数的绝对值，使用函数 DIV32 最后根据存储的符号处理结果：

IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov ch, dh          ; Only the sign bit counts!
    shr ch, 7           ; CH=1 means negative dividend
    mov cl, bh          ; Only the sign bit counts!
    shr cl, 7           ; CL=1 means negative divisor

    cmp ch, 1           ; DX:AX negative?
    jne J1              ; No: Skip the next two lines
    not dx              ; Yes: Negate DX:AX
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J1:

    cmp cl, 1           ; BX negative?
    jne J2              ; No: Skip the next line
    neg bx              ; Yes: Negate BX
    J2:

    push cx             ; Preserve CX
    call DIV32
    pop cx              ; Restore CX

    cmp ch, cl          ; Had dividend and divisor the same sign?
    je J3               ; Yes: Skip the next two lines
    not dx
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J3:

    cmp ch, 1           ; Was divisor negative?
    jnz J4              ; No: Skip the next line
    neg bx              ; Negate remainder
    J4:

    ret
IDIV32 ENDP

Answer 3

使用 2x idiv 有一个根本问题：我们需要第 2 个除法来产生商的低半部分，它是无符号的，可以是从 0 开始的任何值到 0xffff。

只有最高位的多字整数包含符号位，其下的所有位都具有正位值。 idiv的商范围是-2^15 .. 2^15-1，不是0 .. 65535。是的，idiv 可以产生所有需要的值，但不是来自我们可以从第一次除法结果的简单修正中获得的输入。例如，0:ffff / 1 将导致带有 idiv 的 #DE 异常，因为商不适合 signed 16 位整数。

所以第二个除法指令必须是div,使用除数的绝对值,和适当的高一半。（div 要求它的两个输入都是无符号的，因此第一个 idiv 的有符号余数也是一个问题。）

可能仍然可以对第一个除法使用 idiv，但只能对 div 之前的结果进行修正，此外还必须采用除数的绝对值和first-remainder 馈送无符号 div。这是一个有趣的可能性，但在实践中，围绕无符号除法保存和重新应用符号会更便宜。

正如@Martin 指出的那样，+1 / -1 与天真的 idiv 的第一次除法给出了错误的高半商（0 / -1 = 0 而不是 -1），以及错误的输入第二部分（0 % -1 = 0，而不是 -1）。 TODO：弄清楚实际需要什么修正。也许只是一个有条件的 +-1，但我们知道余数的大小不能大于除数，因为 high_half < divisor 是 div 不出错的必要条件和充分条件。

您的 -131,076/2 = -2（可能是巧合）仅在其结果的一半中相差 1：
它应该是 0xffffeffe = -2:-2 而不是 -1:-2.

@rkhb 函数的优化版本，内联 DIV32。

我们记录输入的符号，然后对绝对值进行无符号除法，稍后恢复符号。（如果不需要余数，可以化简；商号只依赖于xor dividend,divisor）

或者如果分红足够小，我们可以用一个idiv。但是，我们必须避免 -2^15 / -1 溢出情况，因此快速检查 DX 是 AX 的符号扩展不仅会遗漏一些安全情况（具有较大的除数），还会尝试不安全情况。如果小数字很常见（就像大多数计算机程序中的情况一样），这种基于 cwd 的快速测试仍然是个好主意，在绝对值计算之后再进行一次测试。

分支在 286 上很便宜，所以我主要保留分支而不是使用无分支 abs()。（例如，对于单个寄存器：对于 cdq（或 sar reg,15）/xor/sub，like compilers make, based on the 2's complement identity that -x = ~x + 1). And mov/neg/cmovl of course isn't available until P6-family. If you need compat with 286 but mostly care about performance on modern CPUs, you might make different choices. But it turns out that a 32-bit branchless ABS is smaller code size than branching. However, it's probably slower than branchy for positive inputs, where some instructions would have been skipped. 有一个有趣的想法，即为被除数和除数创建 0/-1 整数，然后用于稍后重新应用符号，你可以将它们异或在一起，并使用相同的 XOR/SUB 2's complement bithack 来对结果进行符号翻转。

样式：局部标签（在函数内）以 @@ 为前缀。我认为这对于 TASM 来说是正常的，例如 NASM .label 本地标签。

   ; signed 32/16 => 32-bit division, using 32/16 => 16-bit division as the building block
   ; clobbers: CX, SI
IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
;global IDIV32_16       ; for testing with NASM under Linux
;IDIV32_16:
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov   cx, dx          ; save high half before destroying it

;;; Check for simple case
    cwd                   ; sign-extend AX into DX:AX
    cmp   cx, dx          ; was it already correctly sign-extended?
    jne   @@dividend_32bit

    ; BUG: bx=-1  AX=0x8000 overflows with #DE
    ; also, this check rejects larger dividends with larger divisors
    idiv  bx
    mov   bx, dx
    cwd
    ret

;;; Full slow case: divide CX:AX by BX
   @@dividend_32bit:
    mov   si, ax                ; save low half
    mov   ax, cx                ; high half to AX for first div

                                ; CH holds dividend sign
    mov   cl, bh                ; CL holds divisor sign

 ;;; absolute value of inputs
    ; dividend in  AX:SI
    cwd                         ; 0 or -1
    xor   si, dx                ; flip all the bits (or not)
    xor   ax, dx
    sub   si, dx                ; 2's complement identity: -x = ~x - (-1)
    sbb   ax, dx                ; AX:SI = abs(dividend)

    test  bx, bx          ; abs(divisor)
    jnl  @@abs_divisor
    neg   bx
   @@abs_divisor:

 ;;; Unsigned division of absolute values
    xor   dx, dx
    div   bx             ; high half / divisor
    ; dx = remainder = high half for next division
    xchg  ax, si
    div   bx
 ;;; absolute result: rem=DX  quot=SI:AX
    mov   bx, dx
    mov   dx, si


 ;;; Then apply signs to the unsigned results
    test  cx,cx          ; remainder gets sign of dividend
    jns  @@remainder_nonnegative
    neg   bx
  @@remainder_nonnegative:

    xor   cl, ch         ; quotient is negative if opposite signs
    jns  @@quotient_nonnegative
    neg   dx
    neg   ax             ; subtract DX:AX from 0
    sbb   dx, 0          ; with carry
  @@quotient_nonnegative:

    ret
IDIV32 ENDP

优化：

更简单的符号保存和符号测试，使用 x86 的内置符号标志，从结果的 MSB 设置，如果 SF==1 则 js 跳转。避免将符号位下移到 8 位寄存器的底部。可以使用 xor/jns 测试相同符号，因为相同符号将“取消”并保留 SF=0，无论它是 both-0 还是 both-1。（一般来说，XOR 可用于比较是否相等，但通常只在您关心一位而不关心其他位的按位情况下才有用）。
避免单独写入 CH，这有利于现代 Intel CPU 在这种情况下进行部分寄存器重命名。此函数永远不会将 CH 与 ECX 的其余部分分开重命名。（在像 286 这样的旧 CPU 上，mov cx,dx 与 mov ch,dh 没有任何缺点）。我们还避免读取高 8 部分寄存器（例如 test cx,cx 而不是 test ch,ch），因为这在最近的 Intel Sandybridge 系列 CPU 上具有更高的延迟。 ()。在 P6 系列上，写入低 8 位部分寄存器会将它们与完整寄存器分开重命名，因此最好在写入任何内容后只读取 8 位部分寄存器。

当然，在现代 CPU 上，像 cx 这样的 16 位寄存器是 部分寄存器，即使在 16 位模式下也是如此（因为 32 位寄存器在那里可用), 所以即使 mov cx,dx 对 ECX 的旧值也有错误的依赖。

386+

显然在 386+ 上，32 位寄存器/操作数大小可用，即使在 16 位模式下你也可以使用它：

;; i386 version
    ;; inputs: DX:AX / BX
    shl   edx, 16
    mov   dx, ax         ; pack DX:AX into EDX
    mov   eax, edx

    movsx ebx, bx        ; sign-extend the inputs to 32 bit EBX
    cdq                  ; and 64-bit EDX:EAX
    idiv  ebx
    ; results: quotient in EAX, remainder in EDX

    mov   ebx, edx       ; remainder -> bx
    mov   edx, eax
    sar   edx, 16        ; extract high half of quotient to DX
    ;; result: quotient= DX:AX, remainder = BX

这可以从 BX=0 开始#DE，或者从 DX:AX=-2^31 和 BX=-1 (LONG_MIN/-1)

溢出

测试工具：

NASM 从 32 位模式调用的包装器

%if __BITS__ = 32
global IDIV32
IDIV32:
    push   esi
    push   ebx
    push   edi      ; not actually clobbered in this version
    movzx  eax, word [esp+4  + 12]
    movzx  edx, word [esp+6  + 12]
    movzx  ebx, word [esp+8  + 12]
    call   IDIV32_16

    shl    edx, 16
    mov    dx, ax
    mov    eax, edx

    movsx  edx, bx       ; pack outputs into EDX:EAX "int64_t"

    pop    edi
    pop    ebx
    pop    esi
    ret
%endif

C 程序，编译为 32 位和 link 使用 asm:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>

// returns quotient in the low half, remainder in the high half (sign extended)
int64_t IDIV32(int32_t dxax, int16_t bx);

static int test(int a, short b) {
//  printf("\ntest %d / %d\n", a, b);
    int64_t result = IDIV32(a,b);

    int testrem = result>>32;
    int testquot = result;

    if (b==0 || (a==INT_MIN && b==-1)) {
        printf("successfully called with inputs which overflow in C\n"
               "%d/%d gave us %d rem %d\n",
               a,b, testquot, testrem);
        return 1;
    }
    int goodquot = a/b, goodrem = a%b;

    if (goodquot != testquot || goodrem != testrem) {
        printf("%d/%d = %d rem %d\t but we got %d rem %d\n",
               a,b, goodquot, goodrem, testquot, testrem);
        printf("%08x/%04hx = %08x rem %04hx\t but we got %08x rem %04hx\n"
               "%s quotient, %s remainder\n",
               a,b, goodquot, goodrem, testquot, testrem,
               goodquot == testquot ? "good" : "bad",
               goodrem == testrem ? "good" : "bad");
        return 0;
    }
    return 1;
}

int main(int argc, char*argv[])
{
    int a=1234, b=1;
    if(argc>=2) a = strtoll(argv[1], NULL, 0);  // 0x80000000 becomes INT_MIN instead of saturating to INT_MAX in 32-bit conversion
    if(argc>=3) b = strtoll(argv[2], NULL, 0);
    test(a, b);
    test(a, -b);
    test(-a, b);
    test(-a, -b);

    if(argc>=4) {
        int step=strtoll(argv[3], NULL, 0);
        while ( (a+=step) >= 0x7ffe) {  // don't loop through the single-idiv fast path
            // printf("testing %d / %d\n", a,b);
            test(a, b);
            test(-a, -b);
            test(a, -b);
            test(-a, b);
        }
        return 0;
    }
}

（这在 int 和 int32_t 之间很草率，因为我只关心它运行在 x86 Linux 上是相同的类型。）

编译

 nasm -felf32 div32-16.asm &&
 gcc -g -m32 -Wall -O3 -march=native -fno-pie -no-pie div32-test.c div32-16.o

运行和 ./a.out 131076 -2 -1 测试从那个到 0x7ffe（step=-1）的所有红利，除数=-2。（对于 -a / -b、a / -b 等的所有组合）

我没有为商和除数做嵌套循环；你可以用 shell 做到这一点。我也没有做任何聪明的事情来测试一些接近最大值和一些接近范围底部的红利。

Answer 4

我重写了我的 idiv32 程序，以便它将负被除数或除数取反为 positive/unsigned 形式，执行无符号除法，然后如果被除数 XOR 除数为真，则对商取反。

编辑：使用 js 和 jns 而不是针对 80h 的位掩码进行测试。不要打扰返回余数。余数应该与股息的符号相同，但由于我真的不需要余数，所以我不会为了正确处理它而使程序变得更复杂。

idiv32 proc

;Divides a signed 32-bit value by a signed 16-bit value.

;alters ax
;alters bx
;alters dx

;expects the signed 32-bit dividend in dx:ax
;expects the signed 16-bit divisor in bx

;returns the signed 32-bit quotient in dx:ax

push cx
push di
push si

    mov ch, dh      ;ch <- sign of dividend
    xor ch, bh      ;ch <- resulting sign of dividend/divisor

    test dh, dh     ;Is sign bit of dividend set?  
    jns chk_divisor ;If not, check the divisors sign.
    xor di, di      ;If so, negate dividend.  
    xor si, si      ;clear di and si   
    sub di, ax      ;subtract low word from 0, cf set if underflow occurs
    sbb si, dx      ;subtract hi word + cf from 0, di:si <- negated dividend
    mov ax, di      
    mov dx, si      ;copy the now negated dividend into dx:ax

chk_divisor:
    xor di, di
    sbb di, bx      ;di <- negated divisor by default
    test bh, bh     ;Is sign bit of divisor set?
    jns uint_div    ;If not, bx is already unsigned. Begin unsigned division.
    mov bx, di      ;If so, copy negated divisor into bx.

uint_div:
    mov di, ax      ;di <- copy of LSW of given dividend
    mov ax, dx      ;ax <- MSW of given dividend
    xor dx, dx      ;dx:ax <- 0:MSW  
    div bx          ;ax:dx <- ax=MSW of final quotient, dx=remainder

    mov si, ax      ;si <- MSW of final quotient
    mov ax, di      ;dx:ax <- dx=previous remainder, ax=LSW of given dividend
    div bx          ;ax:dx <- ax=LSW of final quotient, dx=final remainder
    mov dx, si      ;dx:ax <- final quotient

    test ch, ch     ;Should the quotient be negative?
    js neg_quot     ;If so, negate the quotient.
pop si              ;If not, return. 
pop di
pop cx
    ret

neg_quot:
    xor di, di      
    xor si, si
    sub di, ax
    sbb si, dx
    mov ax, di
    mov dx, si      ;quotient negated
pop si
pop di
pop cx
    ret

idiv32 endp

没有 32 位寄存器的 32 位/16 位有符号整数除法？

32-bit / 16-bit signed integer division without 32-bit registers?

assembly

integer-division

x86-16

386+

测试工具：