HST.950 /6.872 Problem Set 2

Due 9/23/2004


o          All answers need to be printed and not handwritten.

o          This homework provides some SQL exercises on a small subset of the Children's Hospital database, from the mid-1990's. These data have been "scrubbed", so that specifically identifying information has been altered or removed. Nevertheless, please treat the data with sensitivity, because you will learn from our discussion of medical data confidentiality that even innocuous-seeming data can help re-identify. This CWS database can be downloaded in MS Access Version ,or MySQL version, together with a brief description of the database. To download these files, use the same username/password for class readings. MySQL is available for free non-commercial use.  To use Access, you need to be running Windows and have installed Microsoft Office Professional. 

o          Additional information about the CWS database is in the following documents:

o          Database description

o          Table relationships 

               To download these files, use the same username/password for class readings.

o          Tutorial information about SQL is widely available on the Web.  Some sites are linked from the 6.872 Schedule page.

1.      SQL practice on CWS database: please use as few as possible queries.


a) Please list the query to retrieve all doctors in the database.


b) Suppose you are doing medical research on DIABETES-INSIPIDUS and need related patient documents. Try to list the query for retrieving documents of patients with DIABETES-INSIPIDUS.


c) Give three different queries, each of which will estimate the total number of patients being tracked in the database.  If they result in different numbers, discuss why.  (Ignore the possibility that the same patient is entered several times but with different identifiers.)



2.      a) We learned in class three different paper-based medical record systems: doctor's journal, patient's journal (e.g. Mayo), and problem oriented medical records (e.g. organized around problem lists, with progress tracked in SOAP notes). Please list the pros and cons of each system while comparing with the others.


b) Consider a simplified example of patient's journal. Suppose each doctor will record:


Visit date

Chief complaint (free text)

Results of exams if performed, which includes the following but other exams may be added later:

Physical exam

            Pulse (beats per minute)

            Respiration rate (times per minute)

            Blood pressure (systolic pressure and diastolic pressure in mmHg)

Total blood count

            Red blood cell count

            White blood cell count

Diagnosis (ICD9 code)

Plan (free text)

Provider Name and signature

Design a relational database to store such data. List all tables and their fields, and mark the key field(s). Please make sure your tables satisfy the third normal form.

c) Johnson's "Generic Data Modeling" paper suggests that you could use an alternative design for the relational data base, in which the attributes of an entity such as a visit are represented not all as distinct columns in the data, but as different properties of the entity, stored in a table with fewer columns but many more rows.  Give a description of how you might transform your design in part b to such a representation, and list the tables and fields.

d) Discuss the pros and cons of the two designs in b and c.


3.   What's the difference between genomics and genetics? What are their major foci?