Python files, regexp, processes

Reading and writing files

We use open and read to read the content of a text file as follows:

By default a file is opened readonly. If we try to write on a readonly file an exception is raised:

To write on a file we use mode 'w' as follows. Notice that in this case the file is overwritten. For example:

We can specify 'r+' for read-write files and we can append content using mode 'a' as shown in the following:

Python can deal with binary files by specifying the modifier 'b'. Consider the following example in which we read a file as binary and we obtain a byte stream noted with a 'b' pre-pending the string. If we try to print the first element we in fact get the ASCII code of the first letter, which is 116:

It is possible to read a file by simply iterating over the file object. The effect is to read the file one line after the other, as in the following example:

Alternatively, we can read the whole content and then split it using the split() function, which splits based on the given string. If no argument is given, the split is done based on standard separator characters. We can use join to merge things together, separated by a string as follows:

It is good practice to use the python with statement, since it automatically deals with closing the file even when an exception is raised:

Regular Expressions

We have seen regular expressions with grep and sed. We briefly revise how regular expressions can be used in Python:

match

match matches a regular expression at the start of the string

In case we have a match we can query the Match object using the following methods:

  • group() Return the string matched by the RE
  • start() Return the starting position of the match
  • end() Return the ending position of the match
  • span() Return a tuple containing the (start, end) positions of the match

To search in the middle of a string we use search:

findall and finditer

We can use findall to obtain the list of all matching strings and finditer to obtain the iterator of all match objects:

groups

We can specify groups using brackets and pass argument to the methods group, start, end, span to retrieve information about a specific group as shown in the following example:

sub

Substitution can be done with sub by specifying a replacement string possibly referring to groups as \1, \2, etc. Notice that \ should be protected or string should be specified as raw by pre-pending a r:

greedy and non greedy

the * operator tries to match as much as possible. If we need to match as less as possible we can use the non greedy variant *? as shown in the following example:

Interaction with processes

We show some examples of interaction with other processes. Suppose we want to execute a process from python and process the output. check_output runs the program and returns the output as a binary string, in case execution is successful:

In case of error an exception is raised:

Parameters can be passed by specifying a list instead of a string. For example to execute ls -l we can do the following:

We cannot pass arguments in a single string unless we specify shell=True, which forces to interpret the string through the shell. However this is dangerous in general. Read well the security considerations about using shell=True!
The following example shows how to execute ls -l as a a string by specifying shell=True:

Popen

Interaction is implemented via the powerful and flexible Popen class which, by default, execute a program taking input from stdin and sending output to stdout. For example, we can execute ls -l as follows:

Input and output can be made available to the Python program by specifying it as subprocess.PIPE. In the following example, input and output are redirected from/to a pipe and is thus possible to communicate with the process by invoking communicate():

We see that communicate returns a pair corresponding to stdout and stderr (that we didn’t redirect).
We can refer to stdint and stdout of processes so to simulate shell pipeline.
For example let us simulate ls | grep txt:

We can also pass data to process via communicate, in the form of a bytestream. The following example runs cat and sends b'hello'. The effect is to receive as output b'hello' which is the expected behaviour of cat:

In order to interact with a program it is possible to use read and write but it is important to observe that:

  • read reads until EOF so if we want to interact with a program we need to either limit the number of characters read by issuing read(MAX), or use readline() that reads until a newline
  • write is buffered, and we need to invoke flush() to force emptying the buffer

The following example interacts 10 times with cat:

Exercise

Write a python program that implements an echo server by executing and attaching (using Popen) to command nc -l portnum which, in turns, runs nc as a server listening on port portnum (see nc manpage for more detail). Once the server is running you should be able to connect using nc localhost portnum and the server should send back whatever you write. You can implement simple modification to text. For example, you can capitalize words as follows:

Notice that nc simply takes inputs from stdin and sends it on the network connection while inputs from the network is sent to stdout. It can be thought as a networking version of cat. You can run nc with -l on one terminal and connect to it from another terminal to implement a rudimentary chat.

References

  1. Reading and writing files
  2. Regular expressions HOWTO
  3. subprocess