Intel Assembly

The term program exploitation refers to techniques that allow us to “make a program do something unexpected and not planned”. To make use of these techniques is essential to understand how programs are compiled and executed, and to use tools for controlled execution (debuggers).

NOTE: All the proposed exercises can be tried on a fairly standard Linux 32-bit system. We have also created an account on a ‘testbed’ host where you can perform all the proposed exercises. More information can be found in the home page of the course.

The Assembly language

Executable programs are written in the machine code, which may vary depending on the computer architecture. The Assembly language makes such low level code more readable. In this Lab we will focus on x86 Assembly whose reference manuals can be found here.

Consider this simple C program:

You can copy and paste it into a text editor and save it with name hello.c. Then, compile and execute it from a terminal as follows:

As expected, the program prints integers from 0 to 9 separated by a space.

We can now dump its Assembly code using the following command:

  • Option -M intel shows the Assembly in the Intel syntax instead of the AT&T one (we will use this syntax to be coherent with the textbook “Hacking, the art of exploitation”).
    In the Intel syntax, an assembly instruction for data manipulation has the following general form:

    For example:

    moves number 0x8048520 (an address, in fact) into register eax.
  • Option -D stands for Disassemble
  • Since we want to inspect the code of the main function we grep name main and we print 20 lines after it. Command grep filters what is given as input and prints only the lines that contain the specified string. Option -A20 prints 20 lines after the one containing string main.

Assembly directly refers to the processor registers. In this code we can see the Accumulator eax and the two registers esp, ebp delimiting the stack: stack and base pointers. Recall that variables declared inside functions are stored on the stack. Their position is relative to one of these two registers. In this case esp.

The directive DWORD PTR is used to indicate that the following address enclosed in square brackets is a pointer to a 32-bits number. For example:

moves the 32-bit representation of 0x0 into the address ebp-0xc, i.e., the point on the stack which is located 0xc bytes above the base of the stack. This is in fact the initialisation of variable i to value 0 in the C source code.

Exercise: Try to understand the above Assembly code pointing out the interesting instructions (such as variable initializations, tests, function calls, …).

Modifying the executable program

On the left of the assembly instructions we notice the actual hex bytes corresponding to each instruction. Those bytes are the actual binary code that is executed by the processor. We can use hex editors to directly modify binary code. In the following video we show how to make the above program print only even numbers without recompiling it. We look for specific bytes corresponding to the increment and we modify the increment from 0x1 to 0x2. After you have seen the demo you can try yourself even more sophisticated changes, like the one proposed in the exercise.

Exercise: Try to change the original executable code so to loop when i is strictly less then 9 and not less the or equal to 9. To achieve this you have to change the actual opcode of the assembly instruction (from jle to jl). See for example here for a quick reference to x86 opcodes.

References

Leave a Reply

Your email address will not be published. Required fields are marked *