Unix shell

Unix shell allows for quickly automating interaction with processes and data. In this class we revise basic Unix shell concepts.

Basic commands

The following simple commands are used very often:

  • ls: shows the content of current directory. Option -l displays content in long format; option -a also displays hidden files (beginning with a .);
  • file filename: shows the type of file named filename;
  • pwd: (print working directory) shows the path of current working directory;
  • mkdir name: creates a new directory name in the current working directory;
  • cd path: (change directory) moves working directory to the specified path;
  • cat file: show file content. If more than one file is specified contents are concatenated;
  • echo "hello": prints string “hello”
  • grep word file: look for word in file and prints all lines that contain it;
  • man command: shows command man page. Arrows up and down navigate, q exits, / searches (n next hit, N previous hit);
  • find path expression looks for files in path (recursively) matching the specified expression. For example: find / -name "*.c" -print prints all the file that ends with .c
  • sort file: sort lines of a text file;
  • strings file: find printable strings in a (binary) file.

For most of the above commands, when no filename is specified input is taken from the terminal (ctrl-D sends a EOF and terminates). For example:

$ grep work
I'm checking what happens when grep is run
without specifying a filename! 
How does this work?
How does this work?
ah: matching line are printed out as expected!
(ctrl-D terminates)
$ 

Redirection

Redirection is a fundamental Unix shell mechanism to redirect program input and output from/to a file. When the program output is redirected to a file (symbol >) any output from the program will be written to the file instead of the terminal; similarly, when program input is redirected from a file (symbol <) the content of the file will be sent as input to the program, in place of what the user writes on the terminal.

The following examples illustrate redirection:

  1. ls > tmpfile: write the content of the current folder into file tmpfile. Check with cat tmpfile;
  2. grep shell < tmpfile: command grep shell, with no file specified on the command line, looks for word shell on the input given from the terminal. Adding < tmpfile redirects the content of the file to the grep command. The behaviour is the same as grep shell tmpfile; in fact, grep shell alone waits for user’s input as explained above;
  3. date >> tmpfile: appends current date to file tmpfile (notice that > overwrites the file instead). Check with cat. Note: overwriting is done silently so be careful when using redirection with a single >.

Pipe

Pipes are a fundamental mechanism for process communication in Unix. They are similar to redirection but work between two programs. They constitute a communication channel between processes: a process can write to the pipe and another one can read from it.
In the Unix Shell, pipes are specified using |. In particular, cmd1 | cmd2 | … | cmdn, executes all commands and the output of each command i is given as input to the next command i+1. The output of the last command is printed on the terminal. This is very handy to combine commands and make them operate on data as a pipeline.

A few examples follow:

  1. ls -l | grep shell: shows all file names that contain word shell;
  2. ls | grep shell | sort -r: as before but file names are sorted in inverse alphabetic order (option -r). Notice that in this case we have three programs cooperating together;
  3. ls | grep shell | grep txt: shows all file names that contain both shell and txt.

Regular expressions

Regular expressions are patterns representing sets of strings. They are very useful to perform advanced searches in which it is necessary to find strings with a particular structure. Command grep allows for specifying regular expressions.

  1. ^ is the beginning of line. ls -al | grep '^d' matches all directory files in the current directory (d is the flag that indicates a directory file). If we omit the ^ symbol, grep will match all lines containing a d, not necessarily in the first position;
  2. Analogously $ indicates end of the row;
  3. . represents a single character. For example grep '.ino' will match names such as Nino, Pino, Gino, …
  4. c* represents a possibly empty, arbitrary number of occurrences of character c. For example, grep 'smart *card' will match smartcard, smart card, smart  card and so on (black space is repeated an arbitrary number of times). Of course, it is possible to use .* to match an arbitrary number of arbitrary characters;
  5. Similarly, c\+ represents one or more occurrences of c and c\? represents zero or one occurrences of c. Notice that these characters need to be protected pre-pending a backslash \ character;
  6. To find a special character like . or * it is enough to protect it with a backslash \ character. For characters that needs to be protected in regular expression such as \+ and \? it is instead enough to remove the backslash;
  7. [0123456789] or equivalently [0-9] represents all digits from 0 to 9. For example, [0-9]\+ is a decimal number of arbitrary length;
  8. [^0-9] represents anything that is not a digit. Notice the use of ^ for negating the content of a set in square brackets (which is different from the previous usage representing the beginning of a line). For example grep '^[^0-9]*$ filename' finds all lines that do not containg digits in file filename.
  9. There exist predefined set of characters. For example: ^[[:alnum:][:blank:]]*$ matches all lines composed of alphanumerics and spaces.

References

  1. http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/
  2. https://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/unixscripting/unixscripting.html