ASM算法解码

Question

我正在尝试了解 ASM 中的这个问题。这是代码：

45 33 C9                    xor r9d, r9d 
C7 44 24 18 50 72 69 6D     mov [rsp+arg_10], 6D697250h 
66 C7 44 24 1C 65 53        mov [rsp+arg_14], 5365h 
C6 44 24 1E 6F              mov [rsp+arg_16], 6Fh 
4C 63 C1                    movsxd r8, ecx 
85 C9                       test ecx, ecx 
7E 1C                       jle short locret_140001342 
41 8B C9                    mov ecx, r9d 
            loc_140001329: 
48 83 F9 07                 cmp rcx, 7 
49 0F 4D C9                 cmovge rcx, r9 
48 FF C1                    inc rcx 
8A 44 0C 17                 mov al, [rsp+rcx+arg_F] 
30 02                       xor [rdx], al 
48 FF C2                    inc rdx 
49 FF C8                    dec r8 
75 E7                       jnz short loc_140001329 
            locret_140001342: 
C3                          retn

这里是编码文本：

07 1D 1E 41 45 2A 00 25 52 0D 04 01 73 06 
24 53 49 39 0D 36 4F 35 1F 08 04 09 73 0E 
34 16 1B 08 16 20 4F 39 01 49 4A 54 3D 1B 
35 00 07 5C 53 0C 08 1E 38 11 2A 30 13 1F 
22 1B 04 08 16 3C 41 33 1D 04 4A

我研究 ASM 有一段时间了，我知道大部分命令是什么，但我仍然有一些问题没有找到答案。

如何将编码文本插入算法？
arg_10、arg_14 等是什么？我假设它们来自编码部分，但我不太清楚。

谁能逐行说明这个算法的作用，我理解了一些，但我需要一些说明。

我一直在使用visual studio和c++来测试asm。我知道对于运行一个 asm 过程你可以声明一个这样的函数

extern "C" int function(int a, int b, int c,int d, int f, int g);

并像这样使用它

printf("ASM Returned %d", function(92,2,3,4,5,6));

我也知道前四个参数进入 int RCX、RDX、R8 和 R9，其余的在堆栈上。我对堆栈了解不多，所以我现在不知道如何访问它们。我也知道返回值是RAX包含的值。所以像这样的东西会添加两个数字：

xor eax, eax
mov eax, ecx
add eax, edx
ret

因此，正如 Jester 所建议的，我将逐行解释我认为代码的作用。

xor r9d, r9d                  //xor on r9d (clears the register)
mov [rsp+arg_10], 6D697250h   //moves 6D697250 to the address pointed at by rsp + arg_10
mov [rsp+arg_14], 5365h       //moves 5365 to the adress pointed at by rsp+arg_14
mov [rsp+arg_16], 6Fh         //moves 6F to the adress pointed at by rsp+arg_16
movsxd r8, ecx                //moves ecx, to r8 and sign extends it since exc is 32 bit and r8 is 64 bit
test ecx, ecx                 //tests exc and sets the labels
jle short locret_140001342    //jumps to ret if ecx is zero or less
mov ecx, r9d                  //moves the lower 32 bits or r9 into ecx

loc_140001329:                //label used by jump commands
cmp rcx, 7                    //moves 7(decimal) into rcx
cmovge rcx, r9                //don't know
inc rcx                       //increases rcx by 1
mov al, [rsp + rcx + arg_F]   //moves the the value at adress [rsp + rcx + arg_F] into al,  
                              //this is probably the key step as al is 1 byte and each character is also one byte, it is also the rax register so it holds the value to be returned
xor [rdx], al                 //xor on the value at address [rdx] and al, stores the result at the address of [rdx]
inc rdx                      //increase rdx by 1
dec r8                       //decrease r8 by 1
jnz short loc_140001329      //if r8 is not zero jump back to loc_140...
                             //this essentially is a while loop until r8 reaches 0 (assuming it starts as positive)
locret_140001342:
ret

我仍然不知道 arg_xx 是什么，也不知道编码文本究竟是如何插入到这个算法中的。

Answer 1

我注意到的一件事是存储在这些堆栈偏移处的值是 ASCII：

>>> '5072696d65536f'.decode('hex')
'PrimeSo'

至于输入数据，你可以使用xxd -r -p并从程序中的标准输入读取它：xxd -r -p data.hex | ./myprog

那些 arg_14 等偏移量必须在源代码中的某处声明。但我猜它们是十六进制偏移量 0xf、0x10、0x14、0x16。

Answer 2

我认为你的理解基本正确，一些小的更正：

更正 1

test ecx, ecx                 //tests exc and sets the labels

这会设置标志（不是标签）。

更正 2

cmp rcx, 7                    //moves 7(decimal) into rcx

这会将 rcx 与立即值 7 进行比较，并相应地设置标志。（即在这条指令之后，条件指令如 gt 将仅在 rcx 大于 7 时执行。）

更正 3

cmovge rcx, r9                //don't know

这有条件地（基于您刚刚设置的标志）将 r9 移动到 rcx 中。条件是ge，所以这条指令只有在rcx大于等于7的时候才执行。r9里面是0，所以这个的作用是当rcx达到7的时候把rcx设置回0。

参数

你没有得到关于函数参数的信息，但可以安全地假设 rcx 是要解密的数据的原始长度，而 rdx 是指向数据的指针。

Answer 3

这是我对代码的看法。

    ; rdx holds the message location
    ; ecx holds the message length

    xor r9d, r9d                ; r9d = 0
    mov [rsp+arg_10], 6D697250h ; fix up the key
    mov [rsp+arg_14], 5365h 
    mov [rsp+arg_16], 6Fh       ; which is "PrimeSo"
    movsxd r8, ecx              ; length counter
    test ecx, ecx               ; test the  message length
    jle short locret_140001342  ; skip if invalid length
    mov ecx, r9d                ; reset key index to 0
loc_140001329: 
    cmp rcx, 7                  ; check indexing of key
    cmovge rcx, r9              ; reset if o/range
    inc rcx                     ; obfusacte by incrementing first
    mov al, [rsp+rcx+arg_F]     ; ... and indexing wrong offset
    xor [rdx], al               ; encrypt the message byte
    inc rdx                     ; advance message pointer
    dec r8                      ; loop count
    jnz short loc_140001329     ; next message byte
locret_140001342: 
    retn

我用实现该算法的 C 程序解码了消息，但这太容易了，所以我不会 post 它。

逆向工程

代码没有包含足够的信息来自上而下地解决它，因为一些寄存器没有被加载就被使用了，标签也没有定义。我自下而上地解决了这个问题，方法是识别执行加密的指令，然后从那里开始计算。

虽然堆栈标签没有定义，但命名法足以说明密钥的各个部分实际上是连续的，而小端法的假设揭示了密钥。查看十六进制字节列表可以确认这一点，该列表显示三个值存储在偏移量的 lsb 18、1C 和 1E

Answer 4

好的，我已经弄清楚了算法并使其在 ASM 中也能正常工作。你们是对的，arg_xx 是偏移量。 arg_10 == 0x10，arg_f == 0x0f。数据以数组的形式传入，并带有它的长度。所以在这种情况下 rcx 将是数据长度 47，而 rdx 将指向数组的开头。这是我在 C++ 中用来调用 ASM 过程的函数。

extern "C" void function(int length, char* message);

算法很简单。关键词是"PrimeSo"。它所做的只是对传入的每个值与 "PrimeSo" 中的值之一按递增顺序进行异或运算，一旦到达 "PrimeSo" 中的 'o' ，它就会返回到 'P'.因此

cmp rcx, 7       
cmovge rcx, r9   //as Peter de Rivaz stated this will put 0 into rcx if it is greater or equal to seven
inc rcx

等等

mov al, [rsp + rcx + 0Fh]

将有效地变为 [rsp + 1 + 0fh]、[rsp + 2 + 0Fh]、...、[rsp + 7 + 0Fh]。请注意 "PrimeSo" 存储在 [rsp + 10h] 意味着 [rsp + 1 + 0Fh] 指向 'P'。在循环的每次迭代中，al 将成为 "PrimeSo" 中的字符之一，并循环遍历它们。

xor [rdx], al //This will do an xor operation on [rdx](begining of our message) and al wich is 'P' in the first loop.  
              //It will then store the result in it's place.  

inc rdx       //move to next character
dec r8        //decrease counter
jnz short loc_140001329 //and start the loop again

话虽如此，让我们看看前几个。

xor P, 07 == xor 50, 07 --> 57 = W  
xor r, 1D == xor 72, 1D --> 6F = o  
xor i, 1E == xor 69, 1E --> 77 = w  
xor m, 41 == xor 6D, 41 --> 2C = ,

对于那些想知道的人，这里是 C++ 代码：

#include <fstream>

extern "C" void function(int length, char* message);

int main()
{
    char message[] = { 0x07, 0x1D, 0x1E, 0x41, 0x45, 0x2A, 0x00, 0x25, 0x52, 0x0D, 0x04, 0x01, 0x73, 0x06, 0x24, 0x53, 0x49, 0x39, 0x0D, 0x36, 0x4F, 0x35, 0x1F, 0x08, 0x04, 0x09, 0x73, 0x0E, 0x34, 0x16, 0x1B, 0x08, 0x16, 0x20, 0x4F, 0x39, 0x01, 0x49, 0x4A, 0x54, 0x3D, 0x1B, 0x35, 0x00, 0x07, 0x5C, 0x53, 0x0C, 0x08, 0x1E, 0x38, 0x11, 0x2A, 0x30, 0x13, 0x1F, 0x22, 0x1B, 0x04, 0x08, 0x16, 0x3C, 0x41, 0x33, 0x1D, 0x04, 0x4A, '[=15=]'};
    function(sizeof(message) - 1, message);
    printf("Decoded Message is:\n%s\n", message);


    printf("\n");
    system("pause");
    return 0;
}

不，我没有手动将数据插入到消息中。另请注意，我在末尾添加了一个字符串终止符并使用 sizeof(message) - 1 来避免解码字符串终止符。
这是 ASM 代码，这只是一个名为 assembly.asm 的新文件，里面有这个。

.code

function proc
    xor r9d, r9d
    mov dword ptr [rsp + 18h], 6D697250h 
    mov word ptr [rsp + 1Ch], 5365h 
    mov byte ptr [rsp + 1Eh], 6Fh 
    movsxd r8, ecx
    test ecx, ecx
    jle short locret_140001342 
    mov ecx, r9d

loc_140001329:
    cmp rcx, 7
    cmovge rcx, r9
    inc rcx 
    mov al, [rsp + rcx + 17h]
    xor [rdx], al
    inc rdx
    dec r8
    jnz short loc_140001329

locret_140001342:
    ret

function endp
end

在visual studio中，可以在这里加个断点，然后去debug->windows->registers and debug->windows->memory-memory 1查看寄存器和程序的内存。请注意，rcx 将包含计数，而 rdx 将指向编码消息的开头。

谢谢大家的帮助和建议，没有你们我做不到。

ASM算法解码

ASM algorithm decoding

algorithm

64-bit

assembly

更正 1

更正 2

更正 3

参数