Graham Kemp > Teaching > Programming Tools


Practical: UNIX 2

Aims

Objectives

After this practical you will:

Exercises

  1. Copy the file /users/mdstud/kemp/ptools/html/publications.html into your own file space. Write UNIX commands (using sed and/or grep) that perform the following tasks:

    1. Change all occurrences of `Kemp' to `KEMP'.

    2. Delete all blank lines.

    3. Display the URLs referenced in the file (hint: look for "href" and strip out everything from these lines that is not between the double quote symbols).

    4. Remove all HTML tags (e.g. "<head>").

    5. Remove all HTML tags and then delete all blank lines.

    6. Remove all HTML tags and then delete all blank lines, apart from those that originally contained the tag "<li>", which should appear as blank lines in the final output.

  2. The Gene Ontology Consortium are developing three controlled vocabularies containing recommended terms that should be used when describing molecular function, biological processes and cellular components. The terms are contained in three separate text files that have been downloaded from the Gene Ontology Consortium web site and placed in directory /users/mdstud/kemp/ptools/gene_ontology.

    Examples of Gene Ontology terms in the file component.ontology include "cellular_component", "cell", "cytoskeleton", "actin filament", "prospore membrane", "spore wall (sensu Fungi)", etc.

    Write a command that extracts all terms from a Gene Ontology file and writes a sorted list of terms (one term per line, without duplicates) to standard output.

  3. File /users/mdstud/kemp/ptools/latex/table.tex is a LaTeX source file that contains two tables.
    Copy this file, replace my name with your own, then create and view a PostScript file produced from this file.

    How many ampersand characters are in table.tex?

  4. File /users/mdstud/kemp/ptools/html/StatReport.html contains a summary of the number of structure files deposited in the Protein Data Bank and released each year from 1973 to mid-1998. Use UNIX commands to create two files, each containing two columns of numbers:

    1. file "deposited" which contains the year and the number of entries deposited in that year;

    2. file "released" which contains the year and the number of entries released in that year.

Supplementary Material

You can read more about the Gene Ontology (GO) project on the Gene Ontology Consortium web site. Several different GO browsers have been implemented and some of these can be accessed from the GO Consortium home page. Try using some of these.