Intel Assembly – Secgroup Ca' Foscari

The term program exploitation refers to techniques that allow us to “make a program do something unexpected and not planned”. To make use of these techniques, it is essential to understand how programs are compiled and executed, and to use tools for controlled execution (debuggers).

NOTE: All the proposed exercises can be tried on a fairly standard Linux 32-bit system. We have also created an account on a ‘testbed’ host where you can perform all the proposed exercises. Detailed information can be found here.

The Assembly language

Executable programs are written in the machine code, which may vary depending on the computer architecture. The assembly language makes such low level code more readable. In this Lab we will focus on x86 assembly whose reference manuals can be found here.

Assembly directly refers to processor registers, that we summarise below:

General purpose registers: eax, ecx, edx, ebx, corresponding to Accumulator, Counter, Data, Base. These registers are used to temporarily store values and addresses used in the computation;
Stack registers: esp, ebp, corresponding to stack pointer and base pointer, i.e., the two addresses delimiting the current stack;
Indexes: esi, edi, corresponding to source index and destination index. These registers are used in operations such as array and string copying;
The instruction pointer: eip; points to the instruction to be executed;
Registers for memory segmentation: cs, ds, ss for code, data and stack segments, respectively.

To be coherent with the textbook “Hacking, the art of exploitation”, in the following we will use assembly Intel syntax instead of the default (AT&T) one. In the Intel syntax, an assembly instruction for data manipulation has the following general form:

command ,

For example:

mov    eax,0x8048520

moves number 0x8048520 (an address, in fact) into register eax.

The directive DWORD PTR is used to indicate that the following address enclosed in square brackets is a pointer to a 32-bits number. DWORD means ‘double word’ which in x86 architecture is in fact a single word of 32-bit (the term comes from old 16-bit architectures). For example:

mov    DWORD PTR [ebp-0xc],0x0

moves the 32-bit representation of 0x0 into the address ebp-0xc, i.e., the point on the stack which is located 0xc bytes above the base of the stack. This is the usual way to refer to function variables that are allocated on the stack.

Below, we summarise the most commonly used assembly commands:

mov <dst>, <src>: moves the <src> value to <dst>. It is used to set initial values;
add <dst>, <src>: adds the value in <src> to <dst>;
sub <dst>, <src>: subtracts the value in <src> from <dst>;
and <dst>, <src>: performs a logical and between <src> and <dst>, placing the result in <dst>;
push <target>: pushes the value in <target> to the stack;
pop <target>: pops a value from the stack into <target>;
jmp <target>: jumps to the address in <target>. This is achieved by copying the target address into the Instruction Pointer (EIP) register;
call <address>: calls the function at <address>. Before jumping to the function, the address of the next instruction is pushed to the stack in order to be able to return;
cmp <dst>, <src>: compares <src> with <dst>. This is done by subtracting <src> from <dst> and updating flags that can be checked by subsequent conditional operations (see below);
jle <target>: jumps to the address in <target> if the previously compared <src> was less than or equal to <dst>. The actual test is done on the flags set by cmp;
jge <target>: jumps to the address in <target> if the previously compared <src> was greater than or equal to <dst>. The actual test is done on the flags set by cmp;
lea <dst>, <src> stands for “load effective address”: loads the address of <src> into <dst>;
int <value>: generates software interrupt <value>. This is commonly used to invoke system calls.

Example

Consider this simple C program:

#include 

int main()
{
	int i;
	for (i=0; i<10; i++)
		printf("%d ",i);
	printf("\n");
}

You can copy and paste it into a text editor and save it with name hello.c. Then, compile and execute it from a terminal as follows:

r1x@testbed ~ $ gcc hello.c -o hello
r1x@testbed ~ $ ./hello 
0 1 2 3 4 5 6 7 8 9

As expected, the program prints integers from 0 to 9 separated by a space.

We can now dump its assembly code using the following command:

r1x@testbed ~ $ objdump -M intel -D hello | grep -A20 main
...
0804844a :
 804844a:       8d 4c 24 04             lea    ecx,[esp+0x4]
 804844e:       83 e4 f0                and    esp,0xfffffff0
 8048451:       ff 71 fc                push   DWORD PTR [ecx-0x4]
 8048454:       55                      push   ebp
 8048455:       89 e5                   mov    ebp,esp
 8048457:       51                      push   ecx
 8048458:       83 ec 14                sub    esp,0x14
 804845b:       c7 45 f4 00 00 00 00    mov    DWORD PTR [ebp-0xc],0x0
 8048462:       eb 17                   jmp    804847b 
 8048464:       83 ec 08                sub    esp,0x8
 8048467:       ff 75 f4                push   DWORD PTR [ebp-0xc]
 804846a:       68 30 85 04 08          push   0x8048530
 804846f:       e8 9c fe ff ff          call   8048310 
 8048474:       83 c4 10                add    esp,0x10
 8048477:       83 45 f4 01             add    DWORD PTR [ebp-0xc],0x1
 804847b:       83 7d f4 09             cmp    DWORD PTR [ebp-0xc],0x9
 804847f:       7e e3                   jle    8048464 
 8048481:       83 ec 0c                sub    esp,0xc
 8048484:       6a 0a                   push   0xa
 8048486:       e8 b5 fe ff ff          call   8048340 
 804848b:       83 c4 10                add    esp,0x10
 804848e:       8b 4d fc                mov    ecx,DWORD PTR [ebp-0x4]
 8048491:       c9                      leave  
 8048492:       8d 61 fc                lea    esp,[ecx-0x4]
 8048495:       c3                      ret  
...

Option -M intel shows the assembly in the Intel syntax instead of the AT&T one. Option -D stands for Disassemble. Since we want to inspect the code of the main function we grep name main and we print 20 lines after it (option -A20).

Assembly directly refers to the processor registers. In this code we can see the Accumulator eax and the two registers esp, ebp delimiting the stack: stack and base pointers. Recall that variables declared inside functions are stored on the stack. Their position is relative to one of these two registers. In this case ebp.

Exercise: Try to understand the above assembly code pointing out the interesting instructions (such as variable initializations, tests, function calls, ...).
Notice that the first three commands are extra code for aligning the stack to 16 bytes. This is done for performance reasons and can be ignored. In particular the third command pushes the return address on the new aligned stack so to make it appear as it should to the function (and to make return work). The value of the non-aligned esp (plus 4) is also pushed on the stack in order to make it possible to re-adjust it when the main function returns. This alignment is kept consistent at any function call (spot this in the code!).

Modifying the executable program

On the left of the assembly instructions we notice the actual hex bytes corresponding to each instruction. Those bytes are the actual binary code that is executed by the processor. We can use hex editors to directly modify binary code. In the following video we show how to make the above program print only even numbers without recompiling it. We look for specific bytes corresponding to the increment and we modify the increment from 0x1 to 0x2. After you have seen the demo you can try yourself even more sophisticated changes, like the one proposed in the exercise.

Exercise: Try to change the original executable code so to loop when i is strictly less then 9 and not less the or equal to 9. To achieve this you have to change the actual opcode of the assembly instruction (from jle to jl). See for example here for a quick reference to x86 opcodes.

Get inspiration by the following example:

The Assembly language

Example

Modifying the executable program

References