测试一个值是否在两个范围之一内

Question

我正在编写一个 MIPS 程序，它应该只能使用大写或小写字符作为输入。我的程序使用字符的 ASCII 值工作。

我需要检查输入中的每个字符是否在 65-90 (A-Z) 或 97-122 (a-z) 的 ASCII 范围内。如果它不在这两个范围内，则跳过该字符并重复下一个字符。如何做到这一点？

编辑

这是我刚想出的解决方案，但我敢肯定还有更简单的方法吗？

function:    #increment $t0 to next char of input
             blt $t0, 65, function
             bgt $t0, 122, function
             blt $t0, 91, continue
             bgt $t0, 96, continue
             j   function
continue:    ...
             j   function

Answer 1

不管你做什么，你都需要四个分支机构。

我要做的第一件事是为每条指令添加边栏注释。

好的评论是任何语言的一部分，但 对于 asm 至关重要。几乎每一行都应该有它们。它们解决了您的算法逻辑（即 "what/why"）。指令本身是 "how".

请注意，您对 ASCII 字符使用的是十进制数字。没有评论，很难按照逻辑来确定说明是否正确。

我会稍微调整一下您的分组，将 A-Z 测试放在一起，将 a-z 测试放在一起，而不是将它们混合在一起。这可能会稍微慢一些，但代码更直观。

我也做了第三个版本，非常通俗易懂。它使用字符常量而不是固定的十进制值。

这是带注释的原件：

 function:
    # increment $t0 to next char of input

    blt     $t0,65,function         # less than 'A'? if yes, loop
    bgt     $t0,122,function        # greater than 'z'? if yes, loop

    blt     $t0,91,continue         # less than or equal to 'Z'? if yes, doit
    bgt     $t0,96,continue         # greater than or equal to 'a'? if yes, doit

    j       function

continue:
    # ...
    j       function

这是重新排序的版本：

 function:
    # increment $t0 to next char of input

    blt     $t0,65,function         # less than 'A'? if yes, loop
    blt     $t0,91,continue         # less than or equal to 'Z'? if yes, doit

    bgt     $t0,122,function        # greater than 'z'? if yes, loop
    bgt     $t0,96,continue         # greater than or equal to 'a'? if yes, doit

    j       function

continue:
    # ...
    j       function

这是最直接的版本。这是最容易理解的，就我个人而言，我会这样做。它还消除了 extra/extraneous j 指令。

以前的版本必须 "know" A-Z 的值低于 a-z。他们可以 "get away" 因为 ASCII 值是十进制的 "hardwired"。

在 C 语言中，这 [不一定] 是个好主意（即您会使用字符常量）。 mips 汇编程序允许字符常量，因此以下内容实际上是有效的：

 function:
    # increment $t0 to next char of input

    blt     $t0,'A',trylower        # less than 'A'? if yes, try lowercase
    ble     $t0,'Z',continue        # less than or equal to 'Z'? if yes, doit

trylower:
    blt     $t0,'a',function        # less than 'a'? if yes, loop
    bgt     $t0,'z',function        # greater than 'z'? if yes, loop

continue:
    # ...
    j       function

有一句古老的格言：在让它变得更快之前先做好它（来自 Brian Kernighan 的 "Elements of Programming Style" 和 P.J。Plauger）

这是构建查找的额外版本 table。预构建它需要更多时间，但实际循环更快。

在各种版本中，blt和bgt是生成slti、bne和addi、slti、[的伪操作=23=]分别。所以，我们实际上是在谈论 10 条指令，而不仅仅是 4 条。

因此，table 构建可能值得获得 simpler/faster 循环。

    .data
isalpha:    .space  256

    .text
main:
    la      $s0,isalpha             # get address of lookup table

    li      $t0,-1                  # byte value
    li      $t2,1                   # true value

    # build lookup table
build_loop:
    # increment $t0 to next char of input
    addi    $t0,$t0,1               # advance to next char
    beq     $t0,256,build_done      # over edge? if so, table done

    blt     $t0,'A',build_lower     # less than 'A'? if yes, try lowercase
    ble     $t0,'Z',build_set       # less than or equal to 'Z'? if yes, doit

build_lower:
    blt     $t0,'a',build_loop      # less than 'a'? if yes, loop
    bgt     $t0,'z',build_loop      # greater than 'z'? if yes, loop

build_set:
    addiu   $t1,$s0,$t0             # point to correct array address
    sb      $t2,0($t1)              # mark as a-z, A-Z
    j       build_loop              # try next char

build_done:

function:
    # increment $t0 to next char of input

    addu    $t1,$s0,$t0             # index into lookup table
    lb      $t1,0($t1)              # get lookup table value
    beqz    $t1,function            # is char one we want? if no, loop

continue:
    # ...
    j       function

这是一个带有预定义查找的版本 table:

    .data
isalpha:
    .byte   0:65
    .byte   1:26
    .byte   0:6
    .byte   1:26
    .byte   0:133

    .text
    la      $s0,isalpha             # get lookup table address

function:
    # increment $t0 to next char of input

    addu    $t1,$s0,$t0             # index into lookup table
    lb      $t1,0($t1)              # get lookup table value
    beqz    $t1,function            # is char one we want? if no, loop

continue:
    # ...
    j       function

Answer 2

由于 ASCII 范围恰好排列得很好，您可以 ori 强制大写字符降低，然后使用 sub / sltu 作为范围检查。 What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? 解释了这是如何/为什么有效的。（或 andi 清除小写位并将任何字母字符强制为大写。）或者只执行 addu / sltu 部分以仅检查 isupper 或 islower 而不是 isalpha.

    ori     $t1, $t0, 0x20       # set lower-case bit (optional)

    addiu   $t1, $t1, -97        # -'a'  index within alphabet
    sltiu   $t1, $t1, 26         # idx <= 'z'-'a'  create a boolean

    bnez    $t1, alphabetic      # branch on it, original char still in $t0

遗憾的是，

MARS 的汇编器不允许将 -'a' 作为数字文字，因此您必须手动将其写出。更好的汇编程序，如 clang 或 GNU 汇编程序，真的可以让你编写 addiu $t1, $t1, -'a'.

(如果你的角色在内存中开始，lbu $t0, 0($a0) 或类似的将是一个好的开始。lb 符号扩展加载也可以；这将正确拒绝字节的高位设置它们是零扩展还是符号扩展。我们要“接受”的范围仅包括带符号的正字节值。）

编译器知道这个技巧。例如：

int alpha(unsigned char *p) {
    unsigned char c = *p;
    c |= 0x20;
    return (c>='a' && c <= 'z');
}

使用 MIPS GCC 编译为与

相同的 asm (Godbolt compiler explorer)

int alpha(unsigned char *p) {
  unsigned char c = *p;
  unsigned lcase = c|0x20;
  unsigned alphabet_idx = lcase - 'a';   // 0-index position in the alphabet
  bool alpha = alphabet_idx <= (unsigned)('z'-'a');
  return alpha;
}

事实上，clang 甚至可以将 ((c>='a' && c <= 'z') || (c>='A' && c <= 'Z')); 优化为 MIPS 或 RISC-V 的等效 asm。（也包含在 Godbolt link）。

另见，其中还显示了 '0' .. '9' 的范围检查技巧。

测试一个值是否在两个范围之一内

Testing if a value is within one of two ranges

ascii

mips

range