# Register Operations in Assembly
Now that you're familiar with the basics from [[CPU Registers Building Blocks]], let's see how these registers actually work in assembly code. This is where theory meets practice!
## Reading Assembly Instructions
One thing that often trips up newcomers is how to read assembly instructions. In x86/x64 assembly, instructions generally follow this format:
```nasm
instruction destination, source
```
Think of it like an arrow pointing left: the value moves from the source (right) to the destination (left). For example:
```nasm
mov eax, edx ; EDX → EAX (value in EDX moves to EAX)
mov ecx, 42 ; 42 → ECX (immediate value 42 moves to ECX)
mov [rsp], rax ; RAX → [RSP] (value in RAX moves to memory at RSP)
```
### Number Representations
When reading assembly, you'll encounter numbers in different formats:
```nasm
; Different ways to represent numbers in assembly
mov eax, 42 ; Decimal (base 10)
mov eax, 0x2A ; Hexadecimal (C-style notation)
mov eax, 2Ah ; Hexadecimal (assembly-style notation)
mov eax, 00101010b; Binary
```
>[!tip]
>The 'h' suffix in assembly means "hexadecimal":
>- `60h` = `0x60` = 96 in decimal
>- `18h` = `0x18` = 24 in decimal
>
>If a hex number starts with a letter (A-F), you need to prefix it with 0:
>- `mov eax, 0ABCDh` ✓ Correct
>- `mov eax, ABCDh` ✗ Wrong (assembler will think ABCD is a label)
## Calling Conventions
Understanding calling conventions is crucial for malware analysis, as they dictate how functions receive their arguments. Let's compare the two main Windows calling conventions:
### x86 (32-bit) - stdcall
In x86, the `stdcall` convention (most common in Windows API) works like this:
- Arguments are pushed onto [[The Stack]] from right to left
- The called function is responsible for cleaning up the stack
- Return value is stored in EAX
```nasm
; Calling MessageBoxA(NULL, "Text", "Caption", MB_OK)
push 0 ; MB_OK
push offset Caption
push offset Text
push 0 ; NULL
call MessageBoxA
; Stack is cleaned up by MessageBoxA
```
### x64 (64-bit) - Microsoft x64
The x64 calling convention is notably different:
- First four arguments use registers:
- RCX: First argument
- RDX: Second argument
- R8: Third argument
- R9: Fourth argument
- Additional arguments are pushed onto the stack
- Caller must allocate 32 bytes of "shadow space" on the stack
- Return value is stored in RAX
```nasm
; Calling MessageBoxW(NULL, L"Text", L"Caption", MB_OK)
sub rsp, 28h ; Shadow space + align stack
xor rcx, rcx ; NULL (first arg)
lea rdx, [Text] ; "Text" (second arg)
lea r8, [Caption] ; "Caption" (third arg)
xor r9d, r9d ; MB_OK (fourth arg)
call MessageBoxW
add rsp, 28h ; Clean up shadow space
```
>[!tip]
>The shadow space in x64 is required even if the function takes fewer than four parameters. It provides space for the function to save the register parameters if needed.
**Return Values**: RAX usually holds the result of function calls. This is super important when tracking API calls!
## String Operations
Assembly provides special instructions for efficient string manipulation. The most common are the `MOVSx` family of instructions, which move data from one location to another:
```nasm
; String operation prefixes and instructions
REP ; Repeat the following instruction ECX/RCX times
MOVSB ; Move a byte (8 bits)
MOVSW ; Move a word (16 bits)
MOVSD ; Move a double word (32 bits)
MOVSQ ; Move a quad word (64 bits - x64 only)
```
Here's how these work together `Example: Copying 100 bytes from source to destination`:
```nasm
; Example: Copying 100 bytes from source to destination
lea rsi, [source_buffer] ; RSI ← address of source
lea rdi, [dest_buffer] ; RDI ← address of destination
mov rcx, 100 ; RCX ← number of bytes to copy
rep movsb ; Repeat MOVSB RCX times
```
This is what it would look like in C:
```C
This is equivalent to this C-style loop:
while(rcx > 0) {
*rdi = *rsi;
rsi++;
rdi++;
rcx--;
}
```
>[!tip] Watch for these string operations in malware - they're often used to
>- Decoding encrypted strings
>- Copying shellcode
>- Buffer manipulation in exploits
>
> The direction flag (DF) affects whether RSI or RDI increment (DF=0) or decrement (DF=1)
>
> - `CLD` clears DF (moves forward)
> - `STD` sets DF (moves backward)
Here's a common pattern you might see in malware:
```nasm
; Simple string decoding routine
section .data
encoded_str db "XYZZY" ; Encoded string
key db 0x5 ; XOR key
str_len equ 5 ; Length of string
section .text
decode_string:
cld ; Clear direction flag (move forward)
lea rsi, [encoded_str] ; RSI ← address of encoded string
lea rdi, [encoded_str] ; RDI ← same buffer (in-place decode)
mov rcx, str_len ; RCX ← length of string
decode_loop:
lodsb ; AL ← byte at [RSI], then RSI++
xor al, [key] ; Decode byte with XOR
stosb ; [RDI] ← AL, then RDI++
loop decode_loop ; Decrement RCX and repeat if not zero
```
The `LODS` and `STOS` instructions are also part of the string operation family:
- `LODSB/LODSW/LODSD/LODSQ`: Load from `[RSI]` into AL/AX/EAX/RAX
- `STOSB/STOSW/STOSD/STOSQ`: Store AL/AX/EAX/RAX into `[RDI]`
## Real-World Example
Let's look at a complete example that brings everything together:
```nasm
section .text
global _start
_start:
mov rax, 1 ; System call number for write
mov rdi, 1 ; File descriptor for stdout
mov rsi, msg ; Pointer to our message
mov rdx, len ; Length of our message
syscall ; Make the system call
mov rax, 60 ; System call number for exit
xor rdi, rdi ; Exit code 0
syscall ; Make the system call
```
## Going Further
Ready to dive deeper? Check out:
- [[Advanced Register Techniques]] for complex operations and malware analysis
- [[The Stack]] for understanding memory operations
- [[PE File Format Foundations]] to see how this fits into executable files
- [[Windows API Basics]] for using these concepts with Windows APIs
Remember, practice is key to understanding assembly. Try writing and analysing different code snippets to build your skills!