Introduction to Python

Python is the de-facto standard programming language in the infosec community, many tools are written in Python or support plugins/bindings in this language. Python is very flexible since it supports multiple paradigms (imperative, object-oriented, functional, …). Despite being dynamically typed, it is strongly typed given that only well-defined operations are permitted. All these features, together with a huge library, make Python an ideal language for scripting and fast application prototyping.

Informal Introduction

This is a nice and clear introduction to Python.

Exercise 1

Caesar decrypt: decrypt a given ciphertext encrypted with Caesar.

Example

ciphertext:

Lq fubswrjudskb, d Fdhvdu flskhu, dovr nqrzq dv Fdhvdu'v flskhu, wkh vkliw
flskhu, Fdhvdu'v frgh ru Fdhvdu vkliw, lv rqh ri wkh vlpsohvw dqg prvw zlghob
nqrzq hqfubswlrq whfkqltxhv. Lw lv d wbsh ri vxevwlwxwlrq flskhu lq zklfk hdfk
ohwwhu lq wkh sodlqwhaw lv uhsodfhg eb d ohwwhu vrph ilahg qxpehu ri srvlwlrqv
grzq wkh doskdehw. Iru hadpsoh, zlwk d ohiw vkliw ri 3, G zrxog eh uhsodfhg eb
D, H zrxog ehfrph E, dqg vr rq. Wkh phwkrg lv qdphg diwhu Mxolxv Fdhvdu, zkr
xvhg lw lq klv sulydwh fruuhvsrqghqfh

plaintext:

In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift
cipher, Caesar's code or Caesar shift, is one of the simplest and most widely
known encryption techniques. It is a type of substitution cipher in which each
letter in the plaintext is replaced by a letter some fixed number of positions
down the alphabet. For example, with a left shift of 3, D would be replaced by
A, E would become B, and so on. The method is named after Julius Caesar, who
used it in his private correspondence

Useful links

Exercise 2

Frequency analysis: print the list of pairs (character, number of occurrences) found in a given string, sorted by the number of occurrences.

Exercise 3

Reverse word order: print the words found in a given string in reverse order

Example:

'How are you' --> 'you are How'

Reading and writing files

We use open and read to read the content of a text file as follows:

>>> f = open('file.txt')
>>> f.read()
'Hi this is a text file\nof 3 lines!\nyeah\n'
>>> f.read()
''
>>> f.close()
>>>

By default a file is opened readonly. If we try to write on a readonly file an exception is raised:

>>> f = open('file.txt')
>>> f.write('test')
Traceback (most recent call last):
  File "", line 1, in 
io.UnsupportedOperation: not writable
>>>

To write on a file we use mode 'w' as follows. Notice that in this case the file is overwritten. For example:

>>> f = open('file.txt','w')
>>> f.read()
Traceback (most recent call last):
  File "", line 1, in 
io.UnsupportedOperation: not readable
>>> f.write('test')
4
>>> f.close()
>>> f = open('file.txt')
>>> f.read()
'test'
>>>

We can specify 'r+' for read-write files and we can append content using mode 'a' as shown in the following:

f = open('file.txt','a')
>>> f.write('test2')
5
>>> f.close()
>>> f = open('file.txt')
>>> f.read()
'testtest2'
>>>

Python can deal with binary files by specifying the modifier 'b'. Consider the following example in which we read a file as binary and we obtain a byte stream noted with a 'b' pre-pending the string. If we try to print the first element we in fact get the ASCII code of the first letter, which is 116:

>>> f = open('file.txt','rb')
>>> s = f.read()
>>> s[0]
116
>>> s
b'testtest2'
>>> s.decode()
'testtest2'
>>> s.decode()[0]
't'
>>> f.close()
>>>

It is possible to read a file by simply iterating over the file object. The effect is to read the file one line after the other, as in the following example:

>>> f = open('file.txt')
>>> for l in f:
...     print(l,end='')
...
Hi this is a text file
of 3 lines!
yeah

Alternatively, we can read the whole content and then split it using the split() function, which splits based on the given string. If no argument is given, the split is done based on standard separator characters. We can use join to merge things together, separated by a string as follows:

>>> f = open('file.txt')
>>> d = f.read()
>>> d.split('\n')
['Hi this is a text file', 'of 3 lines!', 'yeah', '']
>>> d.split()
['Hi', 'this', 'is', 'a', 'text', 'file', 'of', '3', 'lines!', 'yeah']
>>> ' '.join(d.split())
'Hi this is a text file of 3 lines! yeah'
>>> ' ___ '.join(d.split())
'Hi ___ this ___ is ___ a ___ text ___ file ___ of ___ 3 ___ lines! ___ yeah'
>>>

It is good practice to use the python with statement, since it automatically deals with closing the file even when an exception is raised:

>>> with open('file.txt') as f:
...     f.read()
...
'Hi this is a text file\nof 3 lines!\nyeah\n'
>>>

Regular Expressions

We have seen regular expressions with grep and sed. We briefly revise how regular expressions can be used in Python:

import re
>>> re.compile('[a-z]+')
re.compile('[a-z]+')
>>> regexp = re.compile('[a-z]+')

match

match matches a regular expression at the start of the string

>>> regexp.match('')
>>> regexp.match('1')
>>> regexp.match('a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>>

In case we have a match we can query the Match object using the following methods:

  • group() Return the string matched by the RE
  • start() Return the starting position of the match
  • end() Return the ending position of the match
  • span() Return a tuple containing the (start, end) positions of the match
>>> regexp.match('abcd123').group()
'abcd'
>>> regexp.match('abcd123').start()
0
>>> regexp.match('abcd123').end()
4
>>> regexp.match('abcd123').span()
(0, 4)
>>>

To search in the middle of a string we use search:

>>> regexp.match('1abcd123')
>>> regexp.search('1abcd123')
<_sre.SRE_Match object; span=(1, 5), match='abcd'>
>>>

findall and finditer

We can use findall to obtain the list of all matching strings and finditer to obtain the iterator of all match objects:

>>> regexp.search('123hello123world123')
<_sre.SRE_Match object; span=(3, 8), match='hello'>
>>> regexp.findall('123hello123world123')
['hello', 'world']
>>>
for f in regexp.finditer('123hello123world123'):
...     print(f)
...
<_sre.SRE_Match object; span=(3, 8), match='hello'>
<_sre.SRE_Match object; span=(11, 16), match='world'>
>>>

groups

We can specify groups using brackets and pass argument to the methods group, start, end, span to retrieve information about a specific group as shown in the following example:

>>> regexp = re.compile('([a-z]+)([0-9]+)')
>>> regexp.search('hello12345world')
<_sre.SRE_Match object; span=(0, 10), match='hello12345'>
>>> regexp.search('hello12345world').group()
'hello12345'
>>> regexp.search('hello12345world').group(0)
'hello12345'
>>> regexp.search('hello12345world').group(1)
'hello'
>>> regexp.search('hello12345world').group(2)
'12345'
>>> regexp.search('hello12345world').span(1)
(0, 5)
>>> regexp.search('hello12345world').span(2)
(5, 10)
>>>

sub

Substitution can be done with sub by specifying a replacement string possibly referring to groups as \1, \2, etc. Notice that \ should be protected or string should be specified as raw by pre-pending a r:

regexp = re.compile('[^a-z]*([a-z]+)([0-9]+)[^0-9]*')
>>> regexp.sub('word = \\1\nnumber = \\2\n','hello12345world')
'word = hello\nnumber = 12345\n'
>>> regexp.sub(r'word = \1\nnumber = \2\n','hello12345world')
'word = hello\nnumber = 12345\n'
>>> print(regexp.sub(r'word = \1\nnumber = \2\n','hello12345world'))
word = hello
number = 12345

greedy and non greedy

the * operator tries to match as much as possible. If we need to match as less as possible we can use the non greedy variant *? as shown in the following example:

>>> regexp = re.compile('.*([a-z]+)([0-9]+).*')
>>> regexp.search('11111111hello123aaaaaaaaa').group(1)
'o'
>>> regexp = re.compile('.*?([a-z]+)([0-9]+).*')
>>> regexp.search('11111111hello123aaaaaaaaa').group(1)
'hello'
>>>

Interaction with processes

We show some examples of interaction with other processes. Suppose we want to execute a process from python and process the output. check_output runs the program and returns the output as a binary string, in case execution is successful:

>>> import subprocess
>>> subprocess.check_output('ls')
b'file.txt\npython-regex-popen.txt\n'
>>> subprocess.check_output('ls').decode()
'file.txt\npython-regex-popen.txt\n'
>>> print(subprocess.check_output('ls').decode())
file.txt
python-regex-popen.txt

In case of error an exception is raised:

>>> subprocess.check_output('lsz')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.4/subprocess.py", line 607, in check_output
    with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
  File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.4/subprocess.py", line 1459, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'lsz'

Parameters can be passed by specifying a list instead of a string. For example to execute ls -l we can do the following:

 print(subprocess.check_output(['ls','-l']).decode())
total 44
-rw-rw-r-- 1 focardi focardi    40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi  5849 feb 16 15:14 python-regex-popen.txt

We cannot pass arguments in a single string unless we specify shell=True, which forces to interpret the string through the shell. However this is dangerous in general. Read well the security considerations about using shell=True!
The following example shows how to execute ls -l as a a string by specifying shell=True:

>>> print(subprocess.check_output('ls -l').decode())
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.4/subprocess.py", line 607, in check_output
    with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
  File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.4/subprocess.py", line 1459, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls -l'
>>> print(subprocess.check_output('ls -l', shell=True).decode())
total 44
-rw-rw-r-- 1 focardi focardi    40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi  7371 feb 16 15:18 python-regex-popen.txt

Popen

Interaction is implemented via the powerful and flexible Popen class which, by default, execute a program taking input from stdin and sending output to stdout. For example, we can execute ls -l as follows:

>>> subprocess.Popen(['ls','-l'])

>>> total 52
-rw-rw-r-- 1 focardi focardi    40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi  8546 feb 16 15:24 python-regex-popen.txt

Input and output can be made available to the Python program by specifying it as subprocess.PIPE. In the following example, input and output are redirected from/to a pipe and is thus possible to communicate with the process by invoking communicate():

>>> p = subprocess.Popen(['ls','-l'], stdout=subprocess.PIPE)
>>> p.communicate()
(b'total 136\n-rw-r--r--  1 focardi  staff    269 Feb 16 19:01 README.txt\n-rw-r--r--  1 focardi  staff     26 Feb 16 18:35 file.txt\n-rw-r--r--  1 focardi  staff    174 Feb 16 20:54 popen.py\n-rw-r--r--  1 focardi  staff  11062 Feb 16 19:03 python-regex-popen-wordpress.html\n-rw-r--r--  1 focardi  staff  32547 Feb 16 19:24 python-regex-popen.html\n-rw-r--r--  1 focardi  staff  11168 Feb 16 21:13 python-regex-popen.txt\n', None)
>>>

We see that communicate returns a pair corresponding to stdout and stderr (that we didn’t redirect).
We can refer to stdint and stdout of processes so to simulate shell pipeline.
For example let us simulate ls | grep txt:

>>> from subprocess import *
>>> p1 = Popen(["ls"], stdout=PIPE)
>>> p2 = Popen(["grep", "txt"], stdin=p1.stdout, stdout=PIPE)
>>> p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits (PIPE is dup2-ed on standard output)
>>> output = p2.communicate()[0]
>>> print(output.decode())
file.txt
python-regex-popen.txt

We can also pass data to process via communicate, in the form of a bytestream. The following example runs cat and sends b'hello'. The effect is to receive as output b'hello' which is the expected behaviour of cat:

>>> p1 = Popen(["cat"], stdin=PIPE,stdout=PIPE)
>>> p1.communicate(b'hello')
(b'hello', None)
>>>

In order to interact with a program it is possible to use read and write but it is important to observe that:

  • read reads until EOF so if we want to interact with a program we need to either limit the number of characters read by issuing read(MAX), or use readline() that reads until a newline
  • write is buffered, and we need to invoke flush() to force emptying the buffer

The following example interacts 10 times with cat:

>>> p = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
>>> for i in range(10):
...     w = p.stdin.write(b'ciao\n')
...     p.stdin.flush()
...     print(p.stdout.readline().decode(), end='')
...
ciao
ciao
...
ciao

Exercise

Write a python program that interacts with the following C program and wins the game:

#include 
#include 
#include 

int main() {
	srand(time(NULL));   

	int i,d;

	for (i=0;i<10;i++) {
		int r = rand() % 10;      
		printf("Write %i: ",r);
		fflush(stdout);
		scanf("%d",&d);
		if (d != r) {
			printf("WRONG!\n");
			exit(1);
		}
	}
	printf("Great! You did it!\n");
	exit(0);
}

References

  1. The official Python Tutorial, a good place to start
  2. Reading and writing files
  3. Regular expressions HOWTO
  4. subprocess
  5. Style guide for Python code