The AES cipher

The Advanced Encryption Standard (AES) has been selected by the National Institute of Standards and Technology (NIST) after a five-year long competition. The original name of the cipher is Rijndael from the names of the two inventors, the cryptographers Joan Daemen and Vincent Rijmen. As any modern cipher, AES is the composition of rather simple operations and contains a non-linear component to avoid known-plaintexts attacks (as the one we have seen on the Hill cipher). The composed operations give a non-idempotent cipher that is iterated for a fixed number of rounds.

Rijndael has been selected because it resulted to be the best one providing:

high security guarantees
high performance
flexibility (different key length)

All of these features are, in fact, crucial for any modern cipher. Its predecessor, the Data Encryption Standard (DES) is still in use after almost 40 years, in a variant called Triple DES (3DES), which aims at improving the key length. In fact, DES key of only 56 bits is too short to resist brute-forcing on modern, parallel computers.

Mathematical background

AES works on the Galois Field with $2^8$ elements noted $\mathbf{GF}(2^8)$ . Intuitively, it it the set of all 8-bit digits with sum and multiplications performed by interpreting the bits as (binary) coefficients of polinomials. For example, element 11010011 can be seens as $x^7 + x^6 + x^4 + x + 1$ while 00111010 is $x^5+x^4+x^3 + x$ . The sum will thus be $x^7+x^6+x^4+x+1+x^5+x^4+x^3+x=x^7+x^6+x^5+x^3+1$ , since two 1’s coefficient becomes 0, modulo 2, and the term disappears (for example $x + x = 2x = 0x = 0$ ). We see that sum and subtraction are just the bit-wise xor of the binary numbers, i.e., $11010011 \oplus 00111010 = 11101001$ which is $x^7+x^6+x^5+x^3+1$ .

Product is done modulo the irreducible polinomial x⁸ + x⁴ + x³ + x + 1. Irreducible means that it cannot be written as the product of two other polinomials (it is, intuitively, the equivalent of primality). For example, $(x^7 + x^6 + x^4 + x + 1) \times (x^5+x^4+x^3 + x)$ gives

$x^{12} + x^{11} + x^9 + x^6 + x^5+x^{11}+x^{10}+x^8+x^5+x^4+x^{10}+x^9+x^7+x^4+x^3+x^8+x^7+x^5+x^2+x = x^{12}+x^6+x^5+x^3+x^2+x$

Now to divide $x^{12}+x^6+x^5+x^3+x^2+x$ by $x^8 + x^4 + x^3 + x + 1$ and find the reminder we proceed as follows:

$\begin{array}{ccccccccccccc}1&0&0&0&0&0&1&1&0&1&1&1&0 \\ 1&0&0&0&1&1&0&1&1 \\\hline & & & &1&1&1&0&1&1&1&1&0\\ & & & &1&0&0&0&1&1&0&1&1 \\\hline & & & & &1&1&0&0&0&1&0&1 \end{array}$

which gives 11000101, i.e., $x^7 + x^6 + x^2 + 1$ .

This operation is quadratic in general, with respect to the number of bits (8). It can be optimized with the following linear algorithm (which is, in fact, a working python code):

def AESmult(a, b):
    p = 0                       # p is 0 at the beginning
    for i in range(0,8):        # for the 8 bits of a and b do:
        if b & 1 != 0:          # the least significant bit of b is set
            p = p ^ a           # sum a to p (xor)
        b >>= 1                 # shifts b to the right
        hbit = (a & 0x80) != 0  # true if the most significant bit of a is set
        a <<= 1                 # shifts a to the left
        if hbit:                # if the most significant bit of a was set
            a = a ^ 0x11b       # sum 100011011 to a (xor), this always returns
                                # a 8-bit number
    return p

The correctness derives from the invariant, after each loop: ab+p is the product of the initial a and b (all operation does in the Galois Field). Since b is 0 at the end, we have that p will contain the product. This fact derives from the following observations:

b is shifted to the right and a is shifted to the left meaning that we respectively divide and multiply by $x$ the two polynomials;
if b is odd the polynomial is no dividable by $x$ so we throw away the least significant bit (this is what the right shift does, actually) and we accumulate one a in p to compensate and keep the invariant;
when a becomes more than 2⁸ we need to sum to it the modulus 100011011, i.e. 0x11b, to keep it 8-bits long.

NOTE: to test the code just open a python shell (by running python from the terminal) and cut and paste the function into the shell, giving enter at the end. You can now try

>>> AESmult(0b11010011,0b00111010)
197
>>> bin(AESmult(0b11010011,0b00111010))
'0b11000101'

Now that we have introduced the basic operation used to implement AES we can describe the cipher. The official description of AES is available on-line.

AES operates on a 4×4 matrix of bytes. We have that 16 bytes are 128 bits which is, in fact, the block size. Plaintext bytes $b_1,\ldots,b_{16}$ are copied in the matrix by columns following this scheme:

$\begin{bmatrix}b_1&b_5&b_9&b_{13}\\b_2&b_6&b_{10}&b_{14}\\b_{3}&b_7&b_{11}&b_{15}\\b_4&b_8&b_{12}&b_{16}\end{bmatrix}$

Cipher keys have lengths of 128, 192, and 256 bits. AES has 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys. Rijndael was designed to handle additional block sizes and key lengths, however they are not adopted in the AES standard. A round is composed of different operations, all of which are invertible:

AddRoundKey. The round key (see Key Expansion, below) is bitwise xor-ed with the block. A round key is thus 128 bits, independently of the chosen key size.

SubBytes. A fixed non-linear substitution, called S-box, is applied to each byte of the block. The substitution is reported below. Given a byte in hexadecimal notation, the first digit is used to select a row and the second one to select a column. For example, 0x25 would be the third row (2) and the sixth column (5) giving 0x3f.

    0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
  ------------------------------------------------
0 |63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76 
1 |ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0 
2 |b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15 
3 |04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75 
4 |09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84 
5 |53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf 
6 |d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8 
7 |51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2 
8 |cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73 
9 |60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db 
a |e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79 
b |e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08 
c |ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a 
d |70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e 
e |e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df 
f |8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16

NOTE: This S-box has been obtained by taking, for each byte, its multiplicative inverse in the finite field (that can be computed efficiently via an algorithm that we will see later on), noted $b_7, \ldots, b_0$ , and applying the affine transformation $b_i = b_i \oplus b_{i+4 \mod 8} \oplus b_{i+5 \mod 8} \oplus b_{i+6 \mod 8}\oplus b_{i+7 \mod 8} \oplus c_i$ , with $c_i$ representing the i-th bit of 01100011. The above transformation can be written as:

$\begin{bmatrix}b_0\\b_1\\b_2\\b_3\\b_4\\b_5\\b_6\\b_7\end{bmatrix}=\begin{bmatrix}1&0&0&0&1&1&1&1\\1&1&0&0&0&1&1&1\\1&1&1&0&0&0&1&1\\1&1&1&1&0&0&0&1\\1&1&1&1&1&0&0&0\\0&1&1&1&1&1&0&0\\0&0&1&1&1&1&1&0\\0&0&0&1&1&1&1&1\end{bmatrix}\begin{bmatrix}b_0\\b_1\\b_2\\b_3\\b_4\\b_5\\b_6\\b_7\end{bmatrix} \oplus \begin{bmatrix}1\\1\\0\\0\\0\\1\\1\\0\end{bmatrix}$

Using multiplicative inverses is known to give non-linear properties, while the affine transformation complicates the attempt of algebraic reductions.

ShiftRows. Rows of the block matrix are shifted to the left by 0,1,2,3, respectively. The shift is circular:
$\begin{bmatrix}b_1&b_5&b_9&b_{13}\\b_2&b_6&b_{10}&b_{14}\\b_{3}&b_7&b_{11}&b_{15}\\b_4&b_8&b_{12}&b_{16}\end{bmatrix} \Rightarrow\begin{bmatrix}b_1&b_5&b_9&b_{13}\\b_6&b_{10}&b_{14}&b_2\\b_{11}&b_{15}&b_{3}&b_7\\b_{16}&b_4&b_8&b_{12}\end{bmatrix}$
MixColumns. Columns of the block matrix are multiplied by the following matrix
$\begin{bmatrix}c_0\\c_1\\c_2\\c_3\end{bmatrix} = \begin{bmatrix}2&3&1&1\\1&2&3&1\\1&1&2&3\\3&1&1&2\end{bmatrix}\begin{bmatrix}c_0\\c_1\\c_2\\c_3\end{bmatrix}$

For example the first byte of each column is computed as $2c_0 \oplus 3c_1 \oplus c_2 \oplus c_3$ .

NOTE: This fixed matrix is obtained by considering each column as a four-term polynomial with coefficients in $\mathbf{GF}(2^8)$ . The columns are then multiplied modulo $x^4 + 1$ with a fixed polynomial a(x), given by $a(x) = 3x^3 + x^2 + x + 2$ . This specific modulus is such that, e.g., $x^4$ becomes $x^0$ , $x^5$ becomes $x^1$ and so on.

Key Expansion. We have mentioned that AES uses round keys in the AddRoundKey step. These keys are in fact derived from the initial AES key as follows.

Keys are represented as arrays of words of 4 bytes. So, for example, a 128 bit key will be 4 words of 4 bytes, i.e., 16 bytes. This is expanded into an array of size 4 * (Nr + 1), where Nr is the number of rounds. In this way we obtain 4 different words of key for each round.

Let Nk note the number of words of the initial key (e.g. 4 for 128 bits). The first Nk words of the key array are the same as the initial key. Next i-th word is obtained from the previous i-1 word, possibly transformed as described below, xor-ed with word i-Nk. The transformation happens only for words in position multiple of Nk and consists of a cyclic left shift of word bytes by one position (RotWord) followed by a byte-wise application of the S-box (SubWord) and a xor with a round constant (Rcon). This constant at step j is the word [ $x^{j-1}$ ,0x00,0x00,0x00] with $x^{j-1}$ computed in the galois field, meaning $02^{j-1}$ since polynomial $x$ is the binary number 00000010, i.e., 0x02.

Pseudocode follows:

for i in range(0,Nk):
    w[i] = k[i]    # copy the key words in the first Nk words
for i in range(Nk, 4 * (Nr+1)):
    temp = w[i-1]
    if (i % Nk == 0):
        temp = SubWord(RotWord(temp)) ^ Rcon[i/Nk]
    w[i] = w[i-Nk] ^ temp

IMPORTANT NOTE: for 256 bit key, when i-4 is a multiple of Nk SubWord is applied to w[i-1] before the xor. This has been omitted in the code for the sake of readability.

Of course the first byte of Rcon can be precomputed. Here is the table:

Rcon = [
0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 
0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 0xc5, 0x91, 0x39, 
0x72, 0xe4, 0xd3, 0xbd, 0x61, 0xc2, 0x9f, 0x25, 0x4a, 0x94, 0x33, 0x66, 0xcc, 0x83, 0x1d, 0x3a, 
0x74, 0xe8, 0xcb, 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 
0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 
0xc5, 0x91, 0x39, 0x72, 0xe4, 0xd3, 0xbd, 0x61, 0xc2, 0x9f, 0x25, 0x4a, 0x94, 0x33, 0x66, 0xcc, 
0x83, 0x1d, 0x3a, 0x74, 0xe8, 0xcb, 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 
0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 
0x7d, 0xfa, 0xef, 0xc5, 0x91, 0x39, 0x72, 0xe4, 0xd3, 0xbd, 0x61, 0xc2, 0x9f, 0x25, 0x4a, 0x94, 
0x33, 0x66, 0xcc, 0x83, 0x1d, 0x3a, 0x74, 0xe8, 0xcb, 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 
0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 
0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 0xc5, 0x91, 0x39, 0x72, 0xe4, 0xd3, 0xbd, 0x61, 0xc2, 0x9f, 
0x25, 0x4a, 0x94, 0x33, 0x66, 0xcc, 0x83, 0x1d, 0x3a, 0x74, 0xe8, 0xcb, 0x8d, 0x01, 0x02, 0x04, 
0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 
0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 0xc5, 0x91, 0x39, 0x72, 0xe4, 0xd3, 0xbd, 
0x61, 0xc2, 0x9f, 0x25, 0x4a, 0x94, 0x33, 0x66, 0xcc, 0x83, 0x1d, 0x3a, 0x74, 0xe8, 0xcb, 0x8d]

Here is the overall scheme for AES assuming that variable state is initialized with the 4x4 matrix of the plaintext (see above) and w[] has been initialized by key expansion:

AddRoundKey(state, w[0, 3])

for round in range(1,Nr):
    SubBytes(state)
    ShiftRows(state)
    MixColumns(state)
    AddRoundKey(state, w[round*4, round*4+3])

SubBytes(state)
ShiftRows(state)
AddRoundKey(state, w[Nr*4, Nr*4+3])

Decryption is computed by applying inverse operations:

AddRoundKey(state, w[Nr*4, Nr*4+3])

for round in range(Nr-1,0,-1):
    InvShiftRows(state)
    InvSubBytes(state)
    AddRoundKey(state, w[round*4, round*4+3])
    InvMixColumns(state)

InvShiftRows(state)
InvSubBytes(state)
AddRoundKey(state, w[0, 3])

Notice that AddRoundKey is unchanged since xor is the inverse of itself. InvShiftRows trivially amounts to revert the shifts on the row (to the right instead of left). InvSubBytes is computed by using the following inverse substitution of the S-Box:

   0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
  ------------------------------------------------
0 |52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb 
1 |7c e3 39 82 9b 2f ff 87 34 8e 43 44 c4 de e9 cb 
2 |54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e 
3 |08 2e a1 66 28 d9 24 b2 76 5b a2 49 6d 8b d1 25 
4 |72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 b6 92 
5 |6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84 
6 |90 d8 ab 00 8c bc d3 0a f7 e4 58 05 b8 b3 45 06 
7 |d0 2c 1e 8f ca 3f 0f 02 c1 af bd 03 01 13 8a 6b 
8 |3a 91 11 41 4f 67 dc ea 97 f2 cf ce f0 b4 e6 73 
9 |96 ac 74 22 e7 ad 35 85 e2 f9 37 e8 1c 75 df 6e 
a |47 f1 1a 71 1d 29 c5 89 6f b7 62 0e aa 18 be 1b 
b |fc 56 3e 4b c6 d2 79 20 9a db c0 fe 78 cd 5a f4 
c |1f dd a8 33 88 07 c7 31 b1 12 10 59 27 80 ec 5f 
d |60 51 7f a9 19 b5 4a 0d 2d e5 7a 9f 93 c9 9c ef 
e |a0 e0 3b 4d ae 2a f5 b0 c8 eb bb 3c 83 53 99 61 
f |17 2b 04 7e ba 77 d6 26 e1 69 14 63 55 21 0c 7d

Finally, InvMixColumns is given by the following operation

$\begin{bmatrix}c_0\\c_1\\c_2\\c_3\end{bmatrix} = \begin{bmatrix}0e&0b&0d&09\\09&0e&0b&0d\\0d&09&0e&0b\\0b&0d&09&0e\end{bmatrix}\begin{bmatrix}c_0\\c_1\\c_2\\c_3\end{bmatrix}$

For example the first byte of each column is computed as $0e c_0 \oplus 0b c_1 \oplus 0d c_2 \oplus 09 c_3$ .

NOTE: This fixed matrix is obtained by considering each column as a four-term polynomial with coefficients in $\mathbf{GF}(2^8)$ . The columns are then multiplied modulo $x^4 + 1$ with the inverse of the fixed polynomial a(x), given by $a^{-1}(x) = 0b x^3 + 0d x^2 + 09 x + 0e$ .

The algorithm for decryption is written in a form similar to the one for encryption but operations are not in the same order. It can, in fact, become the very same algorithm by noticing that

SubBytes and ShiftRows commute. It does not matter if we first apply the byte-wise substitution or if we first shift the rows. The final result will be the same. Of course, the same holds for the inverse transformations
InvMixColumns(state $\oplus$ roundKey) = InvMixColumns(state) $\oplus$ InvMixColumns(roundKey). This allows for inverting the two functions, provided that InvMixColumns is applied to the all the round keys.

Call dw the array containing the round keys transformed via InvMixColumns. The final decryption algorithm is:

AddRoundKey(state, dw[Nr*4, Nr*4+3])

for round in range(Nr-1,0,-1):
    InvSubBytes(state)
    InvShiftRows(state)
    InvMixColumns(state)
    AddRoundKey(state, dw[round*4, round*4+3])

InvSubBytes(state)
InvShiftRows(state)
AddRoundKey(state, dw[0, 3])

This is exactly the same as the one for encryption, but with the inverse functions. Having the same algorithm for encryption and decryption simplifies a lot implementations, especially if they are done in hardware.

Challenge: 1-round AES

(This challenge will give a +2 score for the first working solution posted and a +2 score for the most elegant one. Any original contribution will give some extra score).

It has been intercepted the following ciphertext:

d7ef8973c2739ae508df16baad6dfe1a
ce63db6478d678ae157813a29090133d
264c1809d70b689a90f0719ba9759851
184a7c4a6f2b5b31667657bd8cdc7cb6
3de08adeb9b12200dc761ffc6c9b1c6f
9883bb9cc155b60fdeeb9d99017067be
6efa018603e264a55b20e8227ba6236a
4ed29219b0f6737a09a2bebdd20fddf9
a4a3b00e31f39d00d3cba966db831556

Rumors tell that it is a mail to Joan Daemen (one of AES authors) encrypted with ECB mode using a simplified 1-round AES cipher:

SubBytes(state)
ShiftRows(state)
MixColumns(state)
AddRoundKey(state, w[0:4])
SubBytes(state)
ShiftRows(state)

Can you recover the whole message?
NOTE: If you want to test AES implementation look at the official AES description (also linked from the AES lesson): at page 35, Appendix C, it gives example vectors.

One thought on “The AES cipher”

If you want to see the algorithm for multiplication at work save this in a file named mult.py and run it by issuing python mult.py in a terminal. It will show the intermediate values and the invariant at each step.
Notice that the invariant is computed using the algorithm itself as it assumes all operations in the field.

def AESmult(a, b, debug):
    p = 0                       # p is 0 at the beginning
    for i in range(0,8):        # for the 8 bits of a and b do:
        if b & 1 != 0:          # the least significant bit of b is set
            p = p ^ a           # sum a to p (xor)
        b >>= 1                 # shifts b to the right
        hbit = (a & 0x80) != 0  # true if the most significant bit of a is set
        a <<= 1                 # shifts a to the left
        if hbit:                # if the most significant bit of a was set
            a = a ^ 0x11b       # sum 100011011 to a (xor), this always returns
                                # a 8-bit number
        if debug:
            print 'a={0:08b} b={1:08b} p={2:08b} axb+p={3:n}'.format( a,b,p, AESmult(a,b,False) ^ p )
    return p

r = AESmult(0b11010011,0b00111010,True)
print r, bin(r)

Comments are closed.

Mathematical background

The AES cipher

Links

Challenge: 1-round AES

One thought on “The AES cipher”