MASSACHVSETTS INSTITVTE OF TECHNOLOGY
Department of Electrical Engineering and Computer Science
6.001--Structure and Interpretation of Computer Programs
Spring Semester, 1997
Problem Set 5
Indexing and Databases
Issued: Tuesday, March 4, 1997
Due: In recitation on Wednesday, March 12 for recitations held at
9, 12, 1 and 2;
and on Friday, March 14 for recitations held at 10 and 11.
Tutorial Preparation: For the week of March 10.
Reading Assignment: Through the end of Chapter 2 of SICP.
REMINDER: Quiz 1 is on Wednesday, March 19.
In this assignment, you will work with a simple personnel data base. The goal of this problem set is to explore issues in symbolic data representation, and to explore the tradeoffs associated with different representations of data, such as computational complexity and ease of programming.
In this data set, we are going to explore a simple database, different ways of structuring the information in such a database, and the implications for such choices on the speed and ease of use of such data bases.
In our simple case, each entry of the data base will contain some basic information, both numeric and symbolic. For example, in the personnel database we will examine, we store name, age and position information about each of the people in the data base.
Each structure in our database will be built with the constructor:
(define (make-struct name-st age-st posn-st) (list 'entry name-st age-st posn-st))
which makes each entry a simple list, marked with a label at the front.
Associated with this constructor are a set of selectors:
(define name-struct cadr) (define age-struct caddr) (define posn-struct cadddr)
Each element of an entry is itself a structure, with associated constructors and selectors:
(define (make-name sur given) (list 'name sur given))(define surname cadr) (define given-name caddr)
(define names cdr)
(define (make-age age) (list 'age age))
(define age cadr)
(define (make-posn p) (list 'posn p)))
(define position cadr)
The first organization for our database will simply build on our existing list structure:
(define empty-db? null?) (define empty-db '()) (define next-db car) (define rest-db cdr) (define add-struct cons)
In other words, we have a base database which is simply the empty list, and we can test for an empty data base by testing for the empty list. We can get the next entry in the database by using the procedure for extracting the next element of a list, and we can add an element to the database by utilizing the constructor for adding an element to a list.
The second organization for our database will also build on the list structure, but will impose the condition that the elements of the list are ordered in some manner (in our case, we will do the ordering by imposing a lexicographical ordering on the name structures of each of the entries, much like in a phone book). Thus the basic constructors and selectors can remain mostly the same.
The third organization for our database will use a binary tree structure, similar to what is discussed in the notes. Here the data abstraction is built around a tree entry:
(define tree-entry car) (define left-branch cadr) (define right-branch caddr) (define (make-tree entry l r) (list entry l r)) (define empty-tree? null?) (define empty-tree '())
In other words, an node in a tree consists of the entry at that node and a left and right branch below that node. We will again utilize an empty list to correspond to an empty node.
To test the procedures you will create in this problem set, we have built a somewhat random database, whose name is linear-test-data. A sample entry from this data base is given below:
(entry (name davis richard) (age 65) (posn (associate professor)))
Since we have chosen to represent our database as a list, note that at least initially, you can use simple list procedures to examine elements of the database.
Because we are using lists to represent our basic databases, we have some generic operations on lists that will be convenient:
(define (accumulate op init items) (cond ((null? items) init) (else (op (car items) (accumulate op init (cdr items))))))(define (filter predicate items) (cond ((null? items) '()) ((predicate (car items)) (cons (car items) (filter predicate (cdr items)))) (else (filter predicate (cdr items)))))
(define (map proc items) (if (null? items) '() (cons (proc (car items)) (map proc (cdr items)))))
You should prepare the following exercises for oral presentation in tutorial. They cover material in sections 2.2 and 2.3 of the text, and they also test your understanding of the material to be used in manipulating our databases, in preparation for doing this week's lab in. You may wish to use the computer to check your answers to these questions, but you should try to do them (at least initially) without the computer.
Turn in listings of your procedures and examples of their evaluation.
To compare symbols, we have provided two procedures:
You should try some examples to determine the details of how these predicates perform.
The problem with which we are left is that entries in the position field (or the name field for that matter) are presented as lists of strings.
Turn in a listing of your procedures and examples of their evaluation.
We are ready to create a procedure to let us test membership in the database, which is a common procedure that we will want to use in many other procedures. Of course, we have to decide what we want to use to determine membership, and for simplicity, we will assume that all the names of people in the database are unique. Using the procedure you created in Lab Exercise 2 for testing equality of lists of symbols, write a procedure called member? with the property that evaluating a form such as
(member? '(grimson eric) linear-test-data)
will return false if there is no entry in the database with that name, and will return the full entry if there is one. Turn in a listing of your procedure, as well as examples of its evaluation on:
(member? '(grimson eric) linear-test-data) (member? '(rodriguez gerry) linear-test-data) (member? '(Sussman Ellen) linear-test-data) (member? '(sussman ellen) linear-test-data)
What is the order of growth associated with using member??
Once you have created member? you can use it to build other useful database procedures. For example, suppose we have two databases that need to be merged. This means that we want to create a single database with the property that no entry is duplicated. Write a procedure that achieves this, and turn in a listing. What is the order of growth associated with your procedure? Create a small database and merge it with our test database, and turn in a listing showing that your procedure works correctly. (Note that the TAs don't really want to see an copy of the original database! Choose your test so that it clearly indicates that your procedure works correctly, without bogging the reader down in lots of irrelevant details.)
Now we want to consider converting our simple database into one with a bit more structure. In particular, we want our new database to again have a list as a top level structure, but now we want our entries to be ordered lexicographically by name.
To do this, we will need a new membership predicate, member-ordered? which, as with member?, either returns false if the element is not a member of the database, or returns the actual entry associated with that name in the database. You should be able to take advantage of the ordering of the database to make a more efficient predicate. Write member-ordered?, remembering that eq? can be used to check equality of symbols, and alpha<? can be used to check ordering of symbols. Turn in a listing of your procedure.
Similar to member-ordered?, we can create a procedure for inserting an entry into an ordered database. Create such a procedure insert-ordered, with calling pattern (insert-ordered elt ordered-db) where elt is a structure made with make-struct and ordered-db is an ordered database. Insert-ordered should create a new version of the database with the element inserted in the appropriate position. Turn in a listing of your procedure.
Finally, create a procedure convert-to-ordered-db which takes a standard unordered database as input, and returns an ordered database. Turn in a listing of your procedure. Test by examining the first few entries of your database.
Suppose we wanted to use the operation of merging in this new ordered data base. What changes do we need to make? What is the order of growth of the new version of merge? Is this new ordered database an improvement over the original form?
The third version of our database will be built using a binary tree structure. Each element of a tree is itself a tree (hence the nice recursion) and is either an empty tree (which we represent as an empty-db) or consist of three elements: an entry, a left branch, and a right branch. The entry is a standard entry as in the previous cases, while the left and right branches are trees. The tree has a nice ordering property, which is that all the elements of the left branch of the tree are ``less than'' the entry of the tree, which is ``less than'' all of the elements of the right branch (where we are again using lexicographical ordering on the name to determine ``less than'').
Similar to Lab Exercise 6, create procedures member-tree?, insert-tree, and convert-to-tree for testing membership in a tree structured database, inserting an element into a tree structured database and converting a list structured database into a tree-structured database.
Turn in listings of your procedures.
What is the order of growth associated with member-tree? and with insert-tree?
How much time did you spend on this homework assignment? Report separately time spent before going to lab and time spent in the 6.001 lab.
If you cooperated with other students working this problem set please indicate their names on your solutions. As you should know, as long as the guidelines described in the 6.001 Policy on Cooperation handout are followed, such cooperation is allowed and encouraged.
This document was generated using the LaTeX2HTML translator Version 96.1-f (May 31, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 database.
The translation was initiated by Jeremy Daniel on Sun Mar 16 02:47:58 EST 1997