NASM Linux x64 |将二进制编码为base64

Question

我正在尝试将二进制文件编码为 base64。一直以来，我都停留在几个步骤上，我也不确定这是否是思考的方式，请参阅下面代码中的评论：

SECTION .bss            ; Section containing uninitialized data

    BUFFLEN equ 6       ; We read the file 6 bytes at a time
    Buff:   resb BUFFLEN    ; Text buffer itself

SECTION .data           ; Section containing initialised data

    B64Str: db "000000"
    B64LEN equ $-B64Str

    Base64: db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

SECTION .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start: 
    nop         ; This no-op keeps gdb happy...

; Read a buffer full of text from stdin:
Read:
    mov eax,3       ; Specify sys_read call
    mov ebx,0       ; Specify File Descriptor 0: Standard Input
    mov ecx,Buff        ; Pass offset of the buffer to read to
    mov edx,BUFFLEN     ; Pass number of bytes to read at one pass
    int 80h         ; Call sys_read to fill the buffer
    mov ebp,eax     ; Save # of bytes read from file for later
    cmp eax,0       ; If eax=0, sys_read reached EOF on stdin
    je Done         ; Jump If Equal (to 0, from compare)

; Set up the registers for the process buffer step:
    mov esi,Buff        ; Place address of file buffer into esi
    mov edi,B64Str      ; Place address of line string into edi
    xor ecx,ecx     ; Clear line string pointer to 0


;;;;;;
  GET 6 bits from input
;;;;;;


;;;;;;
  Convert to B64 char
;;;;;;

;;;;;;
  Print the char
;;;;;;

;;;;;;
  process to the next 6 bits
;;;;;;


; All done! Let's end this party:
Done:
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H         ; Make kernel call

因此，在文本中，它应该这样做：

1) 十六进制值：

7C AA 78

2) 二进制值：

0111 1100 1010 1010 0111 1000

3) 6 位组:

011111 001010 101001 111000

4) 转换为数字：

31 10 41 56

5) 每个数字是一个字母、数字或符号：

31 = f
10 = K
41 = p
56 = 4

所以，最终输出是：fKp4

所以，我的问题是：如何获取 6 位以及如何将这些位转换为 char？

Answer 1

几年后编辑：

最近有人运行这个例子，在讨论它是如何工作的以及如何将它转换为 64b 的 x64 linux 时，我把它变成了完整的例子，源代码在这里： https://gist.github.com/ped7g/c96a7eec86f9b090d0f33ba36af056c1

你有两种主要的方式来实现它，要么通过能够选择任何 6 位的通用循环，要么通过固定代码处理 24 位（3 字节）的输入（将产生恰好 4 个 base64 字符并结束在字节边界处，因此您可以从 +3 偏移量读取接下来的 24 位）。

假设您有 esi 指向源二进制数据，这些数据用零填充到足以使大量内存访问超出输入缓冲区安全（最坏情况下 +3 字节）。

和 edi 指向某个输出缓冲区（至少有 ((input_length+2)/3*4) 个字节，可能有一些填充，因为 B64 需要结束序列）。

; convert 3 bytes of input into four B64 characters of output
mov   eax,[esi]  ; read 3 bytes of input
      ; (reads actually 4B, 1 will be ignored)
add   esi,3      ; advance pointer to next input chunk
bswap eax        ; first input byte as MSB of eax
shr   eax,8      ; throw away the 1 junk byte (LSB after bswap)
; produce 4 base64 characters backward (last group of 6b is converted first)
; (to make the logic of 6b group extraction simple: "shr eax,6 + and 0x3F)
mov   edx,eax    ; get copy of last 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (4th)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bl,[Base64+edx]  ; convert 0-63 value into B64 character (3rd)
shl   ebx,16     ; make room in ebx for next character (4+3 in upper 32b)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (2nd)
; here eax contains exactly only 6 bits (zero extended to 32b)
mov   bl,[Base64+eax]  ; convert 0-63 value into B64 character (1st)
mov   [edi],ebx  ; store four B64 characters as output
add   edi,4      ; advance output pointer

在最后一组 3B 输入后，您必须用适当的 '=' 覆盖最后的输出以修复输出的假零。 IE。输入 1B（需要 8 位，2x B64 字符）=> 输出以 '==' 结束，2B 输入（需要 16b，3x B64 字符）=> 结束 '='，3B 输入 => 使用完整的 24 位 =>有效的 4x B64 字符。

如果你不想将整个文件读入内存并在内存中产生整个输出缓冲区，你可以使 in/out 缓冲区的长度有限，比如只有 900B 输入 -> 1200B 输出，然后处理输入900B块。或者你可以使用 3B -> 4B in/out 缓冲区，然后完全删除指针前进（甚至 esi/edi 用法，并使用固定内存），因为你将不得不 load/store in/out然后每次迭代分别。

免责声明：这段代码的编写是为了简单明了，而不是为了提高性能，因为您询问了如何提取 6 位以及如何将值转换为字符，所以我想最好还是使用基本的 x86 asm 指令。

我什至不确定如何在不分析瓶颈代码和试验其他变体的情况下使其性能更好。部分寄存器的使用（bh, bl vs ebx）肯定会很昂贵，因此很可能有更好的解决方案（或者甚至可能是一些针对更大输入块的 SIMD 优化版本）。

而且我没有调试该代码，只是写在这里作为答案，所以请谨慎操作并检查调试器 how/if 它是否有效。

NASM Linux x64 |将二进制编码为base64

NASM Linux x64 | Encode binary to base64

linux

64-bit

assembly

x86-64

nasm