Identification

Identification is one of the most important issues in security and is often required in order to enforce other security properties. We always identify into a system when we login. We identify to our telephone provider through the sim-card in our phone and we identify to ATMs with our cards and PINs. Any time the access to a resource needs to be controlled, some form of identification is required.

Identification can also be though as authenticating a user or, more generally, an entity. The aim is to allows a verifier to check claimant‘s identity. As an example, in a login-password scheme, the user claims her identity by inserting the login. The system verifies the identity by asking for a secret password.

We typically require that an identification scheme does not allow impersonation: an attacker should not be able to impersonate another entity, even observing previous identifications. Moreover, we want to avoid uncontrolled transferability: the verifier should not be able to reuse a previous identification to impersonate a claimant with a different verifier, unless this is explicitly authorized by the user. Notice that the verifier might have more information available than an outside attacker since part of the communication might be encrypted.

Identification schemes can be based on various elements/aspects:

  1. Something known. For example, passwords, passphrases, Personal Identification Numbers (PINs), cryptographic keys. The identity is verified by checking the knowledge of the secret;
  2. Something possessed. For example, ATM cards, credit cards, smartcards, One Time Password (OTP) generators, USB crypto-tokens. The identity is verified by checking the possession of the device;
  3. Something inherent. For example, (paper) signatures, fingerprints, voice and face recognition, retinal patterns. The identity is verified by some inherent/biometric feature of the user. (These techniques are related to image recognition and will not be treated here.)

Passwords

The most widely adopted identification scheme is login/password. The identity claimed through the login information is checked by asking for a corresponding secret password. By exhibiting the secret, a user directly proves its knowledge. Since the system needs to store the passwords of any user it is crucial that this information is somehow protected. The idea is to store a one-way hash of the password.

Definition (one-way hash function). A one-way hash h is a function that: (i) given arbitrary data x, h(x)=z is a fixed-length value z called digest that can be computed efficiently; (ii) given a digest z, it is infeasible to compute x such that h(x)=z.

Verification proceeds as follows:

  1. User is asked for login and password
  2. The system retrieves the stored hash z of the password for the given login
  3. The system computes h(password) and checks it is the same as z

The fact function h is one-way guarantees, in principle, that even if the whole password file is stolen no password can be recovered from its hash. In practice, we have to be careful for possible brute-forcing.

Dictionary attacks

Since users typically adopt mnemonic, easy password, by bruteforcing over a dictionary of all words in a language we have good possibility of finding ‘easy’ passwords. To bruteforce it is enough to compute the hash of all the words in the dictionary (plus variants with capital letters, for example). This can be precomputed once for all and so a dictionary of pre-hashed words can be just queried for stolen hashes. More easily, we can can directly query Google or specific sites for hash inversion. For example:

$ echo -n "Riccardo" | sha256sum 
f8ac3ee6886f49a31a4033791232010da7de42182fa954a2086e6b027a904cc5  -
$

By googling f8ac3ee6886f49a31a4033791232010da7de42182fa954a2086e6b027a904cc5, we find for example:

sha256 ('Riccardo'). f8ac3ee6886f49a31a4033791232010da7de42182fa954a2086e6b027a904cc5.

which does the trick. This explains why systems are so paranoid with strong passwords: inserting symbols and numbers make the password ‘space’ much bigger and dictionary attacks much harder.

Salted passwords

A standard technique to make dictionary attacks hard is to add a salt to passwords. A salt is a random number that is hashed together with the password. The salt is stored in the password file as cleartext. When the system needs to check a password, it first recovers the salt from the file and then computes the hash of the ‘salted’ password. This makes it harder to precompute the hashes of a dictionary since for each password we should try all the possible salts. A final trick is to make hash computation slow so that bruteforcing is slowed down. This is typically achieved by iterating many times the adopted hash.

Real implementations adopt many iterations of hashes of salted passwords. For example

goofy:$6$Lc5mF7Mm$03IT.AXVhC3Vl4/rLAdomffgv5feOlKBzNGtpEei2dBgK9z/4QBqM3ZMRK4qcbbYJhkAE.2KscEZx0Am/y50: .....

is a Linux password for user goofy, hashed using the SHA512-based algorithm (identified by $6$), as it would appear in the /etc/shadow file of a Linux system. By default SHA512 is iterated 5000 time, but this value can be picked in the interval from 1000 to 999999999. String ‘Lc5mF7Mm’ is the salt and the remaining data is the hashed salted password. This can be checked/recomputed using, for example, the crypt function in python, once we know the cleartext password ‘donald’:

>>> import crypt
>>> crypt.crypt("donald","$6$Lc5mF7Mm$")
'$6$Lc5mF7Mm$03IT.AXVhC3Vl4/rLAdomffgv5feOlKBzNGtpEei2dBgK9z8B/4QBqM3ZMRK4qcbbYJhkAE.2KscEZx0Am/y50'
>>> 

or the mkpasswd command line tool:

$ mkpasswd donald -m sha-512 -S Lc5mF7Mm
$6$Lc5mF7Mm$03IT.AXVhC3Vl4/rLAdomffgv5feOlKBzNGtpEei2dBgK9z8B/4QBqM3ZMRK4qcbbYJhkAE.2KscEZx0Am/y50

Increasing the number of iterations makes the process slower and, of course, produces a different result:

$ time mkpasswd donald -m sha-512 -S Lc5mF7Mm -R 5000000
$6$rounds=5000000$Lc5mF7Mm$FWm/GeTLTryHa0Nt/WfrbLqjVOsipSBNP3IUgwbNP7H95eR8lhKj.6Pc7YcznupXjHXA9QBirkmmaxh3oqt4v.
real    0m22.938s
user    0m22.930s
sys     0m0.000s

The salt is generated at random when the user changes his password and this makes pre-hashed tables very hard to compute, as in the following examples with mkpasswd

$ mkpasswd donald -m sha-512 
$6$KKOpwblUx4TOKQh$I89vJNsqdMWxD2YnbzIjKr2pyPeSUr2b3sFRAfErzCS8TeX2SqsTECkcKGyzzSNH3i0r8rGHbK3WHfEsR4enI0

$ mkpasswd donald -m sha-512 
$6$HT8.lP437G$gNsyr/99EGhvnvNcTlVTdhkf0CcZwoZ90t77Tk7SqOBpEDaLZCUy.BD49hXu97CnVZEP8FOg8J9Jr0b09xfn./

$ mkpasswd donald -m sha-512 
$6$5GO5Jux.FZyNPiQ$wfmv.u7cfCCaETYkc9ZD93saa91awPfZulJOAFwYJcr5.AWIxvpqoa.ejns5Gg87MciJm4FNe2Udo.z7rXP4B/

For example, Googling the hashed password give no results even if it is extremely simple and for sure contained in dictionaries.

Rainbow tables

One of the difficulties with brute-forcing is the space requirement: Storing the hash of a big dictionary and searching into it might become hard, especially if it exceeds the size of available RAM. Rainbow tables are a popular technique that allows for a time/space tradeoff. The idea is to build chains of hashes from a starting password p to a final hash z. To this aim, it is used a reduction function that, given a hash returns a password, whose hash is not required to match the starting one. A simple example of reduction function is the one returning a writable version of the first k bytes of the hash. The following code computes a chain of length 10 starting from message “donald”.

#!/usr/bin/python3
import hashlib,binascii
# Reduction: takes the first 8 bytes and makes them writable
def red(z):
    r = ''
    for c in z[:8]:
        r += chr( ( (c-33) % 93 ) + 33)
    return r

p = "donald"
for _ in range(10):
    z = hashlib.sha256(p.encode()).digest()  # computes the digest
    print(p)
    print(binascii.hexlify(z).decode())
    p = red(z)                      # applies reduction

which gives:

donald
4138cfbc5d36f31e8ae09ef4044bb88c0c9c6f289a6a1c27b335a99d1d8dc86f
A8r_]69{
b6993563cc9fb06b68bc8766b2b556a179557bfb306daade3f032dcf208e9865
Y<5coBSk
1af94c530693bd80abb1bd9eca143324eb3185fbf559634167ece0aa494fd2a1
w?LSc6`#
e5138aee690f1ec23e4fbee436138c51b955b3438a96be23188a7277f1554530
+p-4il{e
93bfa6db6c82dcc1bdf6c9de7682f236817f2e4b25907f7934b0d8d8c28b3107
6bI!l%"d
c880c7f068e2b4fe6ec76fea6756d8b1ee92b0d96d0b867be3b952a3ac75cf96
k#j6h(WD
75532eec682a5c65f5a6f8717afc00f67f2518f8bd251865374447cb6bc50725
uS.2h*\e
9d384a0c159b257534258b255023062cbf560491de12ca79ddffca052a5b67b5
@8Jir>%u
6b16a5147f320f182d8d55ed5631203cede6fde5292ba3bd697cb430c2102d22
ksHq"2lu
25f94e180a5abcf4c4c70ab68fc2c6365dee0778e86652fdef8ddeab60d939d2

The only information stored is the starting password donald and the final hash

25f94e180a5abcf4c4c70ab68fc2c6365dee0778e86652fdef8ddeab60d939d2.

The idea is that this chain “covers” all the intermediated passwords and hashes even if we just store the initial and final value. For example, to invert

c880c7f068e2b4fe6ec76fea6756d8b1ee92b0d96d0b867be3b952a3ac75cf96

we first check if it matches the final hash. Since it doesn’t, we try to hash and reduce it for a number of steps at most the length of the chain. After 4 steps we get

25f94e180a5abcf4c4c70ab68fc2c6365dee0778e86652fdef8ddeab60d939d2

thus to get the preimage we run 10-4-1 = 5 steps from the initial value donald as follows:

p = "donald"
for _ in range(5):
    z = hashlib.sha256(p.encode()).digest()  # computes the digest
    p = red(z)                      # applies reduction
print(p)

and we get password

6bI!l%”d

whose hash is our target

c880c7f068e2b4fe6ec76fea6756d8b1ee92b0d96d0b867be3b952a3ac75cf96.

This can be generalized to a set of chains. Longer chains will save space but will require more time (long chains cover much more hashes but need more time to be examined).

The original idea is due to Martin Hellman [1] and it has been refined by Philippe Oechslin in [2] where rainbow chains are defined. The idea is to use different reduction functions at each step (such as the different colors in the rainbow) so to make the probability of ‘merging’ two chains or looping lower than in the above example, where the same reduction function is used. More detail can be found in the paper.

[1] Martin Hellman. “A cryptanalytic time-memory tradeoff”. In IEEE Transaction on Information Theory, vol IT-26, No. 4, 1980.
[2] Philippe Oechslin. “Making a Faster Cryptanalytic Time-Memory Trade-Off”. In CRYPTO 2003. Springer LNCS 2729.

Personal Identification Numbers (PINs)

A PIN is a short numerical secret code typically used to protect a token such as an ATM card or a smartcard. It enables the token to perform the real authentication task and constitutes a second layer of protection: if the token is lost the PIN will protect its usage. Since PINs are short (4-5 digits) on-line attacks where all the codes are tried should be prevented. This is easily achieved by limiting the number of attempts before locking the token (typically to 3 attempts). Notice that similar countermeasures can be taken for passwords by, e.g., slowing down the login process after the insertion of one or more wrong passwords.

Since nowadays cards contain tamper-resistent chips that are able to physically protect PIN secrecy, the PIN is typically stored and checked in the card. There exist systems, however, that still check the PIN remotely. An example is the bank ATM network that we discuss below.

Bank PIN management

The worldwide ATM system is based on a protocol that sends the PIN typed at the ATM all the way to the issuing bank, which checks its correctness. To this purpose, the PIN is directly encrypted inside the keypad and is sent to subsequent bank network nodes where it is decrypted an re-encrypted all the way to the final destination. These cryptographic operations are all performed inside special tamper-resistant Hardware Security Modules (HSMs) so that the PIN never occurs in the clear outside the HSMs. Even the verification is performed by a special API call that takes as input the encrypted PIN and the user data and returns YES or NO depending on the correctness of the typed PIN.

We exemplify illustrating the pseudo-code of the IBM 3624 PIN Verification method.

PIN_Verify(EPB, account, dectab, offset)
   tPIN = decrypt(K,EPB)      # decrypts the encrypted PIN under K obtaining the PIN typed at the ATM
                              # EPB stands for Encrypted PIN Block
   
   z1 = encrypt(pdk,account)  # user account number is encrypted under the PIN derivation key pdk
   z2 = left(5,z1)            # takes the first 5 hex digits
   z3 = decimalize(dectab,z2) # transforms hex digits into decimal digits using the dectab
   uPIN = sum10(z3,offset)    # digit-wise sum (modulo 10) the obtained number with the given offset

   if (tPIN == uPIN):
      return(YES)             # PINs match, accept
   else:
      return(NO)              # PINs mismatch, refuse

A numeric example with dectab = 0123456789012345 (meaning that decimal number are untouched while abcdef are mapped to 012345) and offset = 14519 follows:

tPIN = 16399    # obtained as decryption under K of the encrypted PIN received

z1 = 0c88028bf3aa6a6a14      # the encryption of account number
z2 = 0c880                   # the first 5 digits
z3 = 02880                   # c is decimalized into 2
uPIN = 02880 + 14519 = 16399 # the offset is added

The answer is YES since the two PINs match.

Even if this computation is performed inside a sophisticated hardware, an attack is still possible that allows to recover all the PIN digits.

We assume to have intercepted data that gets a YES answer. So we know that PIN_Verify(EPB, account, dectab, offset) returns YES. The attacker modifies the dectab by, e.g., changing all the 0’s into 1 and queries again the HSM. We have two cases:

  1. YES. Since the change in the dectab has not subverted the answer we know that digit 0 does not appear in z3
  2. NO. The digit 0 appeared in z3 and has been decimalized into 1, making the check fail.

In our particular example, if we pick dectab’ = 1123456789112345 we in fact obtain

z1 = 0c88028bf3aa6a6a14     
z2 = 0c880                   
z3 = 12881                   # here 0's become 1's !
uPIN = 12881 + 14519 = 26390 # the offset is added

The answer is NO. We know that z3 contained at least one 0 digit. The trick to recover the position of the 0’s is to subtract 1, in turn, from any of the offset digits until we get again a YES answer. For our particular example this happens when we subtract 1 by from the first and last digit, i.e., offset’ = 04518:

z1 = 0c88028bf3aa6a6a14     
z2 = 0c880                   
z3 = 12881                   # here 0's become 1's !
uPIN = 12881 + 04518 = 16399 # the offset' is added

The answer is again YES. We discover that exactly two 0’s appeared in z3 in the first and last positions. Now it is easy to compute the final PIN digits as 0+1 = 1 and 0+9 = 9, since 1 and 9 are the first an last offset digits, respectively. The attack can be optimized and iterated until the whole PIN is recovered, which takes less than 20 API calls. See the Mastermind and PIN cracking web page for more information and references.

One time passwords

The main problem with passwords, PINs and biometrics (even if we have not treated them) is that it is enough for the attacker to intercept/guess them once to re-authenticate as many time he wants. This risk is concrete as we need to show the secret in order to prove our identity. For this reason this kind of identification is not suited for remote usage. For example, when telnet was the standard protocol for remote logins, all the passwords were sent in the clear and were easily interceptable.

A way to mitigate the problem is to use one-time passwords, i.e., password used only once. It is very common, nowadays, to have OTP PIN generator devices for bank transactions, such as the recently questioned RSA SecurID.

The Lamport’s hash-based one-time password scheme

We illustrate a simple and smart way to implement one-time passwords using one-way hash functions. Given a secret s and a one way hash function h we compute ht(s), i.e., h(h(… h(s)… )) t times, and we let the claimant and the verifier share this latter value.

We then use the following list of passwords: ht-1(s), ht-2(s), … h(s), s. Since the verifier knows ht(s), he basically have the hash of the first password ht-1(s). The check is done as usual: the verifier computes the hash of the received password, i.e., h(ht-1(s)), and compares it with the stored hash. If they match identification is successful. In order to enforce the one-time mechanism, after accepting identity the verifier substitutes the stored hash with the password just accepted which is, in fact, the hash of the next password, and so on.

This allows for t identifications with t different passwords. Since h is one-way, intercepting one password does not allow to compute the next password as this would amount to compute a preimage of h.