Intel Assembly

The term program exploitation refers to techniques that allow us to “make a program do something unexpected and not planned”. To make use of these techniques is essential to understand how programs are compiled and executed, and to use tools for controlled execution (debuggers).

NOTE: All the proposed exercises can be tried on a fairly standard Linux 32-bit system. We have also created an account on a ‘testbed’ host where you can perform all the proposed exercises. Detailed information can be found here.

The Assembly language

Executable programs are written in the machine code, which may vary depending on the computer architecture. The assembly language makes such low level code more readable. In this Lab we will focus on x86 assembly whose reference manuals can be found here.

Assembly directly refers to processor registers, that we summarise below:

  • General purpose registers: eax, ecx, edx, ebx, corresponding to Accumulator, Counter, Data, Base. These registers are used to temporarily store values and addresses used in the computation;
  • Stack registers: esp, ebp, corresponding to stack pointer and base pointer, i.e., the two addresses delimiting the current stack;
  • Indexes: esi, edi, corresponding to source index and destination index. These registers are used in operations such as array and string copying;
  • The instruction pointer: eip; points to the instruction to be executed;
  • Registers for memory segmentation: cs, ds, ss for code, data and stack segments, respectively.

To be coherent with the textbook “Hacking, the art of exploitation”, in the following we will use assembly Intel syntax instead of the default (AT&T) one.] In the Intel syntax, an assembly instruction for data manipulation has the following general form:

For example:

moves number 0x8048520 (an address, in fact) into register eax.

The directive DWORD PTR is used to indicate that the following address enclosed in square brackets is a pointer to a 32-bits number. DWORD means ‘double word’ which in x86 architecture is in fact a single word of 32-bit (the term comes from old 16-bit architectures). For example:

moves the 32-bit representation of 0x0 into the address ebp-0xc, i.e., the point on the stack which is located 0xc bytes above the base of the stack. This is the usual way to refer to function variables that are allocated on the stack.

Below, we summarise the most commonly used assembly commands:

  • mov <dst>, <src>: moves the <src> value to <dst>. It is used to set initial values;
  • add <dst>, <src>: adds the value in <src> to <dst>;
  • sub <dst>, <src>: subtracts the value in <src> from <dst>;
  • and <dst>, <src>: performs a logical and between <src> and <dst>, placing the result in <dst>;
  • push <target>: pushes the value in <target> to the stack;
  • pop <target>: pops a value from the stack into <target>;
  • jmp <target>: jumps to the address in <target>. This is achieved by copying the target address into the Instruction Pointer (EIP) register;
  • call <address>: calls the function at <address>. Before jumping to the function, the address of the next instruction is pushed to the stack in order to be able to return;
  • cmp <dst>, <src>: compares <src> with <dst>. This is done by subtracting <src> from <dst> and updating flags that can be checked by subsequent conditional operations (see below);
  • jle <target>: jumps to the address in <target> if the previously compared <src> was less than or equal to <dst>. The actual test is done on the flags set by cmp;
  • jge <target>: jumps to the address in <target> if the previously compared <src> was greater than or equal to <dst>. The actual test is done on the flags set by cmp;
  • lea <dst>, <src> stands for “load effective address”: loads the address of <src> into <dst>;
  • int <value>: generates software interrupt <value>. This is commonly used to invoke system calls.

Example

Consider this simple C program:

You can copy and paste it into a text editor and save it with name hello.c. Then, compile and execute it from a terminal as follows:

As expected, the program prints integers from 0 to 9 separated by a space.

We can now dump its assembly code using the following command:

Option -M intel shows the assembly in the Intel syntax instead of the AT&T one. Option -D stands for Disassemble. Since we want to inspect the code of the main function we grep name main and we print 20 lines after it (option -A20).

Assembly directly refers to the processor registers. In this code we can see the Accumulator eax and the two registers esp, ebp delimiting the stack: stack and base pointers. Recall that variables declared inside functions are stored on the stack. Their position is relative to one of these two registers. In this case ebp.

Exercise: Try to understand the above assembly code pointing out the interesting instructions (such as variable initializations, tests, function calls, …).

Modifying the executable program

On the left of the assembly instructions we notice the actual hex bytes corresponding to each instruction. Those bytes are the actual binary code that is executed by the processor. We can use hex editors to directly modify binary code. In the following video we show how to make the above program print only even numbers without recompiling it. We look for specific bytes corresponding to the increment and we modify the increment from 0x1 to 0x2. After you have seen the demo you can try yourself even more sophisticated changes, like the one proposed in the exercise.

Exercise: Try to change the original executable code so to loop when i is strictly less then 9 and not less the or equal to 9. To achieve this you have to change the actual opcode of the assembly instruction (from jle to jl). See for example here for a quick reference to x86 opcodes.

Get inspiration by the following example:

References

Leave a Reply

Your email address will not be published. Required fields are marked *