Python is the de-facto standard programming language in the infosec community, many tools are written in Python or support plugins/bindings in this language. Python is very flexible since it supports multiple paradigms (imperative, object-oriented, functional, …). Despite being dynamically typed, it is strongly typed given that only well-defined operations are permitted. All these features, together with a huge library, make Python an ideal language for scripting and fast application prototyping.
Informal Introduction
This is a nice and clear introduction to Python.
Exercise 1
Caesar decrypt: decrypt a given ciphertext encrypted with Caesar.
Example
ciphertext:
Lq fubswrjudskb, d Fdhvdu flskhu, dovr nqrzq dv Fdhvdu'v flskhu, wkh vkliw flskhu, Fdhvdu'v frgh ru Fdhvdu vkliw, lv rqh ri wkh vlpsohvw dqg prvw zlghob nqrzq hqfubswlrq whfkqltxhv. Lw lv d wbsh ri vxevwlwxwlrq flskhu lq zklfk hdfk ohwwhu lq wkh sodlqwhaw lv uhsodfhg eb d ohwwhu vrph ilahg qxpehu ri srvlwlrqv grzq wkh doskdehw. Iru hadpsoh, zlwk d ohiw vkliw ri 3, G zrxog eh uhsodfhg eb D, H zrxog ehfrph E, dqg vr rq. Wkh phwkrg lv qdphg diwhu Mxolxv Fdhvdu, zkr xvhg lw lq klv sulydwh fruuhvsrqghqfh
plaintext:
In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of 3, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence
Useful links
Exercise 2
Frequency analysis: print the list of pairs (character, number of occurrences) found in a given string, sorted by the number of occurrences.
Exercise 3
Reverse word order: print the words found in a given string in reverse order
Example:
'How are you' --> 'you are How'
Reading and writing files
We use open
and read
to read the content of a text file as follows:
>>> f = open('file.txt')
>>> f.read()
'Hi this is a text file\nof 3 lines!\nyeah\n'
>>> f.read()
''
>>> f.close()
>>>
By default a file is opened readonly. If we try to write on a readonly file an exception is raised:
>>> f = open('file.txt')
>>> f.write('test')
Traceback (most recent call last):
File "", line 1, in
io.UnsupportedOperation: not writable
>>>
To write on a file we use mode 'w'
as follows. Notice that in this case the file is overwritten. For example:
>>> f = open('file.txt','w')
>>> f.read()
Traceback (most recent call last):
File "", line 1, in
io.UnsupportedOperation: not readable
>>> f.write('test')
4
>>> f.close()
>>> f = open('file.txt')
>>> f.read()
'test'
>>>
We can specify 'r+'
for read-write files and we can append content using mode 'a'
as shown in the following:
f = open('file.txt','a')
>>> f.write('test2')
5
>>> f.close()
>>> f = open('file.txt')
>>> f.read()
'testtest2'
>>>
Python can deal with binary files by specifying the modifier 'b'
. Consider the following example in which we read a file as binary and we obtain a byte stream noted with a 'b'
pre-pending the string. If we try to print the first element we in fact get the ASCII code of the first letter, which is 116:
>>> f = open('file.txt','rb')
>>> s = f.read()
>>> s[0]
116
>>> s
b'testtest2'
>>> s.decode()
'testtest2'
>>> s.decode()[0]
't'
>>> f.close()
>>>
It is possible to read a file by simply iterating over the file object. The effect is to read the file one line after the other, as in the following example:
>>> f = open('file.txt')
>>> for l in f:
... print(l,end='')
...
Hi this is a text file
of 3 lines!
yeah
Alternatively, we can read the whole content and then split it using the split()
function, which splits based on the given string. If no argument is given, the split is done based on standard separator characters. We can use join to merge things together, separated by a string as follows:
>>> f = open('file.txt')
>>> d = f.read()
>>> d.split('\n')
['Hi this is a text file', 'of 3 lines!', 'yeah', '']
>>> d.split()
['Hi', 'this', 'is', 'a', 'text', 'file', 'of', '3', 'lines!', 'yeah']
>>> ' '.join(d.split())
'Hi this is a text file of 3 lines! yeah'
>>> ' ___ '.join(d.split())
'Hi ___ this ___ is ___ a ___ text ___ file ___ of ___ 3 ___ lines! ___ yeah'
>>>
It is good practice to use the python with
statement, since it automatically deals with closing the file even when an exception is raised:
>>> with open('file.txt') as f:
... f.read()
...
'Hi this is a text file\nof 3 lines!\nyeah\n'
>>>
Regular Expressions
We have seen regular expressions with grep
and sed
. We briefly revise how regular expressions can be used in Python:
import re
>>> re.compile('[a-z]+')
re.compile('[a-z]+')
>>> regexp = re.compile('[a-z]+')
match
match
matches a regular expression at the start of the string
>>> regexp.match('')
>>> regexp.match('1')
>>> regexp.match('a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>>
In case we have a match we can query the Match object using the following methods:
-
group()
Return the string matched by the RE -
start()
Return the starting position of the match -
end()
Return the ending position of the match -
span()
Return a tuple containing the (start, end) positions of the match
>>> regexp.match('abcd123').group()
'abcd'
>>> regexp.match('abcd123').start()
0
>>> regexp.match('abcd123').end()
4
>>> regexp.match('abcd123').span()
(0, 4)
>>>
search
To search in the middle of a string we use search
:
>>> regexp.match('1abcd123')
>>> regexp.search('1abcd123')
<_sre.SRE_Match object; span=(1, 5), match='abcd'>
>>>
findall and finditer
We can use findall
to obtain the list of all matching strings and finditer
to obtain the iterator of all match objects:
>>> regexp.search('123hello123world123')
<_sre.SRE_Match object; span=(3, 8), match='hello'>
>>> regexp.findall('123hello123world123')
['hello', 'world']
>>>
for f in regexp.finditer('123hello123world123'):
... print(f)
...
<_sre.SRE_Match object; span=(3, 8), match='hello'>
<_sre.SRE_Match object; span=(11, 16), match='world'>
>>>
groups
We can specify groups using brackets and pass argument to the methods group
, start
, end
, span
to retrieve information about a specific group as shown in the following example:
>>> regexp = re.compile('([a-z]+)([0-9]+)')
>>> regexp.search('hello12345world')
<_sre.SRE_Match object; span=(0, 10), match='hello12345'>
>>> regexp.search('hello12345world').group()
'hello12345'
>>> regexp.search('hello12345world').group(0)
'hello12345'
>>> regexp.search('hello12345world').group(1)
'hello'
>>> regexp.search('hello12345world').group(2)
'12345'
>>> regexp.search('hello12345world').span(1)
(0, 5)
>>> regexp.search('hello12345world').span(2)
(5, 10)
>>>
sub
Substitution can be done with sub
by specifying a replacement string possibly referring to groups as \1, \2, etc. Notice that \ should be protected or string should be specified as raw by pre-pending a r:
regexp = re.compile('[^a-z]*([a-z]+)([0-9]+)[^0-9]*')
>>> regexp.sub('word = \\1\nnumber = \\2\n','hello12345world')
'word = hello\nnumber = 12345\n'
>>> regexp.sub(r'word = \1\nnumber = \2\n','hello12345world')
'word = hello\nnumber = 12345\n'
>>> print(regexp.sub(r'word = \1\nnumber = \2\n','hello12345world'))
word = hello
number = 12345
greedy and non greedy
the * operator tries to match as much as possible. If we need to match as less as possible we can use the non greedy variant *? as shown in the following example:
>>> regexp = re.compile('.*([a-z]+)([0-9]+).*')
>>> regexp.search('11111111hello123aaaaaaaaa').group(1)
'o'
>>> regexp = re.compile('.*?([a-z]+)([0-9]+).*')
>>> regexp.search('11111111hello123aaaaaaaaa').group(1)
'hello'
>>>
Interaction with processes
We show some examples of interaction with other processes. Suppose we want to execute a process from python and process the output. check_output
runs the program and returns the output as a binary string, in case execution is successful:
>>> import subprocess
>>> subprocess.check_output('ls')
b'file.txt\npython-regex-popen.txt\n'
>>> subprocess.check_output('ls').decode()
'file.txt\npython-regex-popen.txt\n'
>>> print(subprocess.check_output('ls').decode())
file.txt
python-regex-popen.txt
In case of error an exception is raised:
>>> subprocess.check_output('lsz')
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/subprocess.py", line 607, in check_output
with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.4/subprocess.py", line 1459, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'lsz'
Parameters can be passed by specifying a list instead of a string. For example to execute ls -l
we can do the following:
print(subprocess.check_output(['ls','-l']).decode())
total 44
-rw-rw-r-- 1 focardi focardi 40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi 5849 feb 16 15:14 python-regex-popen.txt
We cannot pass arguments in a single string unless we specify shell=True
, which forces to interpret the string through the shell. However this is dangerous in general. Read well the security considerations about using shell=True
!
The following example shows how to execute ls -l
as a a string by specifying shell=True
:
>>> print(subprocess.check_output('ls -l').decode())
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/subprocess.py", line 607, in check_output
with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.4/subprocess.py", line 1459, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls -l'
>>> print(subprocess.check_output('ls -l', shell=True).decode())
total 44
-rw-rw-r-- 1 focardi focardi 40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi 7371 feb 16 15:18 python-regex-popen.txt
Popen
Interaction is implemented via the powerful and flexible Popen class which, by default, execute a program taking input from stdin and sending output to stdout. For example, we can execute ls -l
as follows:
>>> subprocess.Popen(['ls','-l'])
>>> total 52
-rw-rw-r-- 1 focardi focardi 40 feb 16 14:23 file.txt
-rw-rw-r-- 1 focardi focardi 8546 feb 16 15:24 python-regex-popen.txt
Input and output can be made available to the Python program by specifying it as subprocess.PIPE
. In the following example, input and output are redirected from/to a pipe and is thus possible to communicate with the process by invoking communicate()
:
>>> p = subprocess.Popen(['ls','-l'], stdout=subprocess.PIPE)
>>> p.communicate()
(b'total 136\n-rw-r--r-- 1 focardi staff 269 Feb 16 19:01 README.txt\n-rw-r--r-- 1 focardi staff 26 Feb 16 18:35 file.txt\n-rw-r--r-- 1 focardi staff 174 Feb 16 20:54 popen.py\n-rw-r--r-- 1 focardi staff 11062 Feb 16 19:03 python-regex-popen-wordpress.html\n-rw-r--r-- 1 focardi staff 32547 Feb 16 19:24 python-regex-popen.html\n-rw-r--r-- 1 focardi staff 11168 Feb 16 21:13 python-regex-popen.txt\n', None)
>>>
We see that communicate
returns a pair corresponding to stdout and stderr (that we didn’t redirect).
We can refer to stdint and stdout of processes so to simulate shell pipeline.
For example let us simulate ls | grep txt
:
>>> from subprocess import *
>>> p1 = Popen(["ls"], stdout=PIPE)
>>> p2 = Popen(["grep", "txt"], stdin=p1.stdout, stdout=PIPE)
>>> p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits (PIPE is dup2-ed on standard output)
>>> output = p2.communicate()[0]
>>> print(output.decode())
file.txt
python-regex-popen.txt
We can also pass data to process via communicate
, in the form of a bytestream. The following example runs cat
and sends b'hello'
. The effect is to receive as output b'hello'
which is the expected behaviour of cat
:
>>> p1 = Popen(["cat"], stdin=PIPE,stdout=PIPE)
>>> p1.communicate(b'hello')
(b'hello', None)
>>>
In order to interact with a program it is possible to use read
and write
but it is important to observe that:
-
read
reads until EOF so if we want to interact with a program we need to either limit the number of characters read by issuingread(MAX)
, or usereadline()
that reads until a newline -
write
is buffered, and we need to invokeflush()
to force emptying the buffer
The following example interacts 10 times with cat
:
>>> p = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
>>> for i in range(10):
... w = p.stdin.write(b'ciao\n')
... p.stdin.flush()
... print(p.stdout.readline().decode(), end='')
...
ciao
ciao
...
ciao
Exercise
Write a python program that interacts with the following C program and wins the game:
#include
#include
#include
int main() {
srand(time(NULL));
int i,d;
for (i=0;i<10;i++) {
int r = rand() % 10;
printf("Write %i: ",r);
fflush(stdout);
scanf("%d",&d);
if (d != r) {
printf("WRONG!\n");
exit(1);
}
}
printf("Great! You did it!\n");
exit(0);
}
References
- The official Python Tutorial, a good place to start
- Reading and writing files
- Regular expressions HOWTO
- subprocess
- Style guide for Python code