# Register Operations in Assembly Now that you're familiar with the basics from [[CPU Registers Building Blocks]], let's see how these registers actually work in assembly code. This is where theory meets practice! ## Reading Assembly Instructions One thing that often trips up newcomers is how to read assembly instructions. In x86/x64 assembly, instructions generally follow this format: ```nasm instruction destination, source ``` Think of it like an arrow pointing left: the value moves from the source (right) to the destination (left). For example: ```nasm mov eax, edx ; EDX → EAX (value in EDX moves to EAX) mov ecx, 42 ; 42 → ECX (immediate value 42 moves to ECX) mov [rsp], rax ; RAX → [RSP] (value in RAX moves to memory at RSP) ``` ### Number Representations When reading assembly, you'll encounter numbers in different formats: ```nasm ; Different ways to represent numbers in assembly mov eax, 42 ; Decimal (base 10) mov eax, 0x2A ; Hexadecimal (C-style notation) mov eax, 2Ah ; Hexadecimal (assembly-style notation) mov eax, 00101010b; Binary ``` >[!tip] >The 'h' suffix in assembly means "hexadecimal": >- `60h` = `0x60` = 96 in decimal >- `18h` = `0x18` = 24 in decimal > >If a hex number starts with a letter (A-F), you need to prefix it with 0: >- `mov eax, 0ABCDh` ✓ Correct >- `mov eax, ABCDh` ✗ Wrong (assembler will think ABCD is a label) ## Calling Conventions Understanding calling conventions is crucial for malware analysis, as they dictate how functions receive their arguments. Let's compare the two main Windows calling conventions: ### x86 (32-bit) - stdcall In x86, the `stdcall` convention (most common in Windows API) works like this: - Arguments are pushed onto [[The Stack]] from right to left - The called function is responsible for cleaning up the stack - Return value is stored in EAX ```nasm ; Calling MessageBoxA(NULL, "Text", "Caption", MB_OK) push 0 ; MB_OK push offset Caption push offset Text push 0 ; NULL call MessageBoxA ; Stack is cleaned up by MessageBoxA ``` ### x64 (64-bit) - Microsoft x64 The x64 calling convention is notably different: - First four arguments use registers: - RCX: First argument - RDX: Second argument - R8: Third argument - R9: Fourth argument - Additional arguments are pushed onto the stack - Caller must allocate 32 bytes of "shadow space" on the stack - Return value is stored in RAX ```nasm ; Calling MessageBoxW(NULL, L"Text", L"Caption", MB_OK) sub rsp, 28h ; Shadow space + align stack xor rcx, rcx ; NULL (first arg) lea rdx, [Text] ; "Text" (second arg) lea r8, [Caption] ; "Caption" (third arg) xor r9d, r9d ; MB_OK (fourth arg) call MessageBoxW add rsp, 28h ; Clean up shadow space ``` >[!tip] >The shadow space in x64 is required even if the function takes fewer than four parameters. It provides space for the function to save the register parameters if needed. **Return Values**: RAX usually holds the result of function calls. This is super important when tracking API calls! ## String Operations Assembly provides special instructions for efficient string manipulation. The most common are the `MOVSx` family of instructions, which move data from one location to another: ```nasm ; String operation prefixes and instructions REP ; Repeat the following instruction ECX/RCX times MOVSB ; Move a byte (8 bits) MOVSW ; Move a word (16 bits) MOVSD ; Move a double word (32 bits) MOVSQ ; Move a quad word (64 bits - x64 only) ``` Here's how these work together `Example: Copying 100 bytes from source to destination`: ```nasm ; Example: Copying 100 bytes from source to destination lea rsi, [source_buffer] ; RSI ← address of source lea rdi, [dest_buffer] ; RDI ← address of destination mov rcx, 100 ; RCX ← number of bytes to copy rep movsb ; Repeat MOVSB RCX times ``` This is what it would look like in C: ```C This is equivalent to this C-style loop: while(rcx > 0) { *rdi = *rsi; rsi++; rdi++; rcx--; } ``` >[!tip] Watch for these string operations in malware - they're often used to >- Decoding encrypted strings >- Copying shellcode >- Buffer manipulation in exploits > > The direction flag (DF) affects whether RSI or RDI increment (DF=0) or decrement (DF=1) > > - `CLD` clears DF (moves forward) > - `STD` sets DF (moves backward) Here's a common pattern you might see in malware: ```nasm ; Simple string decoding routine section .data encoded_str db "XYZZY" ; Encoded string key db 0x5 ; XOR key str_len equ 5 ; Length of string section .text decode_string: cld ; Clear direction flag (move forward) lea rsi, [encoded_str] ; RSI ← address of encoded string lea rdi, [encoded_str] ; RDI ← same buffer (in-place decode) mov rcx, str_len ; RCX ← length of string decode_loop: lodsb ; AL ← byte at [RSI], then RSI++ xor al, [key] ; Decode byte with XOR stosb ; [RDI] ← AL, then RDI++ loop decode_loop ; Decrement RCX and repeat if not zero ``` The `LODS` and `STOS` instructions are also part of the string operation family: - `LODSB/LODSW/LODSD/LODSQ`: Load from `[RSI]` into AL/AX/EAX/RAX - `STOSB/STOSW/STOSD/STOSQ`: Store AL/AX/EAX/RAX into `[RDI]` ## Real-World Example Let's look at a complete example that brings everything together: ```nasm section .text global _start _start: mov rax, 1 ; System call number for write mov rdi, 1 ; File descriptor for stdout mov rsi, msg ; Pointer to our message mov rdx, len ; Length of our message syscall ; Make the system call mov rax, 60 ; System call number for exit xor rdi, rdi ; Exit code 0 syscall ; Make the system call ``` ## Going Further Ready to dive deeper? Check out: - [[Advanced Register Techniques]] for complex operations and malware analysis - [[The Stack]] for understanding memory operations - [[PE File Format Foundations]] to see how this fits into executable files - [[Windows API Basics]] for using these concepts with Windows APIs Remember, practice is key to understanding assembly. Try writing and analysing different code snippets to build your skills!