Cryptography is a powerful tool to protect information, especially when this is exposed to insecure environments such as the Internet. Historically, it mainly aimed at providing confidentiality, i.e., protecting from unauthorized access. Intuitively, cryptography amounts to transforming a plaintext into a ciphertext so that unauthorized users cannot easily reconstruct the plaintext. By illustrating ancient, classic simple ciphers we will point out what are the important issues related to cryptography and we will give a formal, more precise definition of it.
Caesar cipher
This is probably the simplest and most famous cipher, due to Julius Caesar. The idea is very simple: each letter of a message is substituted with the one that is 3 positions next in the alphabet. So, for example, ‘A’ is replaced with ‘D’ and ‘M’ with ‘P’. The substitution can be represented as follows:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
DEFGHIJKLMNOPQRSTUVWXYZABC
meaning that each letter in the top alphabet is substituted with the corresponding one in the bottom (rotated) alphabet. For example, the word HOME would be encrypted as KRPH. To decrypt it is enough to apply the inverse substitution:
DEFGHIJKLMNOPQRSTUVWXYZABC
ABCDEFGHIJKLMNOPQRSTUVWXYZ
This cipher is clearly insecure for many different reasons. First of all, once the cipher has been broken any previous exchanged message is also broken. This is due to the fact that this cipher always works the same way. There is a famous principle in cryptography, due to Auguste Kerckhoffs, that tells that a cipher should remain secure even if the algorithm becomes public. This is achieved by parametrizing ciphers on a key. The key can be changed and is assumed to be the only secret. This is of course fundamental if we want a cipher to scale and be used by millions of users.
We thus give a variant of the cipher, called shift cipher, which is parametrized on a key k, that we assume to range from 0 to 25. Intuitively, k represents the number of positions in the alphabet that we shift each letter of. For example k = 10 gives the following substitution (notice that the bottom alphabet is now shifted to the left by 10 positions):
ABCDEFGHIJKLMNOPQRSTUVWXYZ
KLMNOPQRSTUVWXYZABCDEFGHIJ
Brute force. Even this variation of the cipher is insecure. There an easy attack that consists of trying, by “brute force”, all the possible 26 keys. There is no smart analysis of the encryption algorithm: the problem is the (very) small number of keys.
Substitution cipher
To overcome the previous limitation we extend the key to a generic substitution. For example:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
SWNAMLXCVJBUYKPDOQERIFHGZT
Now, the word HOME is encrypted as CPYM. As for the Caesar cipher, to decrypt we just apply the inverse substitution:
SWNAMLXCVJBUYKPDOQERIFHGZT
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Is brute forcing still possible? How many keys do we have now?
Since a key is a generic substitution which can be represented as a permutation of the alphabet, the number of keys is the number of permutations of 26 elements, i.e., 26! which is approximately 4 x 1026, a number bigger than 288 which makes it very heavy to brute force even using powerful parallel computers.
So what is wrong with this cipher? The problem is that it is monoalphabetic meaning that it maps a letter always to the very same letter. This preserves the statistics of the plaintext and makes it possible to reconstruct the key by observing the statistics in the ciphertext. For example vowels e,a,o,i will be easy to identify as they are much more frequent than the other letters.
Challenge
Can you decrypt the following ciphertext?
GNDO DO L ODEFYK KRLEFYK CA L HDFNKIGKRG. XKYY BCWK!
ZCS WCX MWCX GNLG OCYJDWQ HNLYYKWQKO XDYY QDJK ZCS KRGIL OHCIK LO
OCCW LO ZCS FCOG ZCSI OCYSGDCW KRFYLDWDWQ NCX ZCS BDB DG!