Graham Kemp


Computing Science Graduate Course

The Functional Approach to Data Management:
List Comprehensions and Data Integration

24-28 May 2004


Practical 1


List Comprehension Exercises

Start up Jython2.1 (or Python if preferred) You need a script listmod.py in which to put your definitions You can consult it (or reconsult it) by

>>> from listmod import *
  1. Define a function factors(n) in your script, using a comprehension that returns all factors greater than 1 of integer n. They should be in ascending order (no duplicates). Test it for n = 10, 20, 50 etc..

  2. Define a function primes(m), using a comprehension to compute the list of primes between 2 and m inclusive. HINT - define a function isprime(ps,p) using a comprehension, which returns True `(1 in Python) if integer p is prime to all integers in ps and False(0) otherwise.

    2A) Harder - for a potential prime p you only need to test divisibility by integers n such that n*n < p+1. If you assume your list ps in Q2 is sorted in ascending order, can you make a more efficient version?

    2A*) Can you make it work even faster by ensuring that list ps contains only primes? In Miranda this is possible elegantly by lazy evaluation, but Jython doesn't have it, (nor does it have partial application), so it may not be possible.

  3. Use comprehensions to define intersect(s1,s2) and setdiff(s1,s2) which return intersection and set difference of sets of integers represented as unordered lists without duplicates. NOTE (to test set membership use "X in set", converse is "not in").

  4. Use comprehensions to compute a natural join between sets of tuples of a given type (lists of column names supplied as parameters). Start by assuming there is only one common column. Generalise to deal with a join of two sets of tuples of the same type - behaves as intersection - or two sets with no common column - behaves as cartesian product.

  5. Jython has a Dictionary type which behaves like a set of key-value pairs, indexed on key. e.g.

    >>> dict = {'a': 1, 'c': 55, 'd': 4]
    >>> dict.get('c')   # direct access
    55
    >>> dict.update({'d': 7, 'e': 23}) #update in place
    >>> dict
    {'a': 1, 'c': 55, 'd':7, 'e': 23}
    >>> dict.items()   # converts to list of tuples
    [('a', 1), ('c', 55) ...]
    

    Try representing a relation as a dictionary with a string prime key paired with a tuple of values. Again assume the relation type is given as a tuple of column names - key first.

    Now re-implement natural join using dictionaries for fast access:

    1. Joining two relations of same type on common prime key
    2. Joining prime key in one relation to foreign key in another.

    5*) If time, consider what happens if prime key spans the first N columns, and foreign key columns with matching names need not be adjacent or ordered!

  6. Experiment with Flattening comprehensions inside others. For example:

    [d for d in factors(10) if d > 7]  can be expanded and flattened to
    [d for d in range(2,10) if (10 % d) == 0 and (d < 7)]
    

    Can you generalise this to other cases where one generates from a comprehension? What happens if the filter includes a test on a comprehension? Can you simplify it?


Longer Projects

  1. Suppose you have functions of no arguments, representing entity types such as student(), person(), course(), that generate lists of relational tuples. Using a comprehension syntax as in this course, you could express a query such as:

     "[(name(s),title(c)) | s <- student(); c <- course(); e <- enrolled();
                            name(s) = sname(e); cname(c) = ename(e); level(c) > 3]"
    
    1. Write a Jython/Python program to generate a Jython List comprehension for such a query, working on pre-defined functions delivering sets of tuples and other functions (like title) on these tuples delivering the appropriate tuple offset. Execute and test it.

      HINT - you need some facility to parse string input, like a String Tokeniser in Java. Consider the builtin string class method split(separator,[maxsplits]):

      >>> abc-de-fgh'.split('-')
      ['abc','def','gh']
      
      Otherwise there is a DOM parser in a jython library ...
      import org.w3c.dom as dom   # may need org.apache.xerces
      
    2. Write a program in Jython using its string processing facilities to generate an equivalent Select-From-Where SQL query. e.g.
            select name, title
            from   student, course, enrolled
            where  student.name = enrolled.sname
                   and ... 
      
  2. Use the JDBC or zxJDBC modules to access an existing MySQL database and send your generated SQL queries and get results back as lists.

  3. Similar to A) but generate a P/FDM query in Daplex syntax. You need to create a dictionary structure to hold schema information. You also need to use relationship functions such as takes(s), which delivers a set of courses taken by a student, in place of join predicates (name(s) = sname(e)).

    You use takes(s) as a generator. You should be able to generate something like:

        for each s in student
           for each c in takes(s) such that level(c) > 3
              print(name(s), title(c));