Graham Kemp


Computing Science Graduate Course

Protein Shape and Protein Databases

21-25 October 2002


What is Bioinformatics? Why is it important? What are proteins? Why are they important? Why should computing scientists be interested in Bioinformatics?

Bioinformatics is one of the major growth areas in biotechnology and the biological sciences. It is also an area that presents challenging new opportunities for computing science research. The aim of this course is to introduce some of the current challenges in Bioinformatics.

Main topics: protein geometry, protein modelling, protein shape representation, database management and data modelling issues. There will also be opportunities for discussions on other areas of bioinformatics.

No previous biochemical knowledge is assumed!


Schedule

Date Time Place Activity
Monday 21 October 2002 10:00-11:45 S4 Lecture: Protein Geometry
This lecture is based on lectures on Protein Structure that were given as part of the course "Bioinformatics I". The web pages for those lectures contain further information: lecture 1 and lecture 2.
Monday 21 October 2002 15:15-17:001 B and C Practical: Molecular Graphics 1
Tuesday 22 October 2002 15:15-17:002 S4 Lecture: Protein Modelling
This lecture is based on one that was given as part of the course "Bioinformatics I". The web page for that lecture contains further information.
Wednesday 23 October 2002 10:00-11:45 S4 Lecture: Protein Shape Representation and Docking
This lecture is based on one that was given as part of the course "Bioinformatics III" last year. The web page for that lecture contains further information.
Wednesday 23 October 2002 13:15-15:00 MD2 Practical: Molecular Graphics 2
Thursday 24 October 2002 10:00-11:45 S1 Lecture: Databases and Data Models for Protein Structure
Thursday 24 October 2002 13:15-15:00 B and C Practical: P/FDM
Friday 25 October 2002 10:00-11:45 S4 Lecture: Federated Databases and Advanced Data Modelling in Bioinformatics
Friday 25 October 2002 13:15-15:00 B and C Practical
Friday 25 October 2002 15:00-15:30 MD8 Discussion

1The practical on Monday afternoon has been scheduled at 15:15 to avoid a clash with Peter Gennemark's Licentiate defence at 13:00.

2On Tuesday there is an introduction for new employees at the School of Computer Science and Engineering from 10:00 to 15:00.


Monday: Protein Geometry

Protein structure determines protein function and, thus, is central to all biological processes. The aim of this lecture is to introduce the beauty and complexity of three-dimensional protein structures, and to describe the basic principles of protein conformation. The lecture will be followed by a practical session using molecular graphics software.

Tuesday: Protein Modelling

Knowledge of a protein's three-dimensional structure is vital to a full understanding of the molecular basis for its biological function. Since we want to understand the function of all proteins encoded by a genome, we would like to know all of their 3-D structures. However, experimental techniques for determining protein structure are relatively slow and expensive, so we look to modelling as a way of extending the set of 3-D structures.

Building a 3-D protein model has been compared to solving a 3-D jigsaw puzzle, with the extra complications that the pieces can change shape and there is no "picture on the box" to help. The solution is somewhere in a vast conformational search space and various computing techniques (e.g. constraint logic programming) can be used to help find it. This lecture will describe several of the sub-problems in the modelling process, and approaches to their solution.

Wednesday: Protein Shape Representation and Docking

Given the three-dimensional structures of two proteins, can we predict whether they will associate and, if so, in what way? Since knowledge about molecular interactions is fundamental to understanding biological function, this is an important question. However, even if we assume that the two interacting structures are rigid, there is still a vast search space to be explored as we look for the relative orientation of two molecules that gives the best fit in terms of shape and chemical complementarity. This lecture will describe alternative ways of representing molecular shapes in a computer, and an approach to solving the protein docking problem.

Thursday: Databases and Data Models for Protein Structure

The biological sciences are data-rich areas. Good data management is a prerequisite before many scientific questions can be addressed. The functional data model (FDM) will be described in this lecture. P/FDM, a database management system based on the functional data model and implemented mainly in Prolog, will also be described. Examples combining geometrical calculations with data retrieval will be given to demonstrate how a database can be used as a tool for exploring hypotheses about protein structure. Query optimisation will also be discussed. The lecture will be followed by a practical session using the P/FDM database management system.

Friday: Federated Databases and Advanced Data Modelling in Bioinformatics

"Data integration and management is an area with less glamour than high-performance computing but, probably, more practical relevance for the biotech industry. Researchers need to organise and integrate information about genes and proteins from many different sources, in many formats and file types, so that they can uncover patterns and associations." (Financial Times Survey: Biotechnology, November 27 2001)

Scientists' ability to use bioinformatics data resources effectively to explore hypotheses in silico is enhanced if it is easy to ask precise and complex questions that span across several different kinds of data resource in order to find the answer. This lecture will describe the P/FDM Mediator - a program whose tasks include determining which external databases are relevant in answering users' queries, dividing queries into parts that will be sent to different external databases, translating these subqueries into the language(s) of the external databases, and combining the results for presentation.

Some data modelling issues relating to managing hierarchical biological data will also be described.