Spreadsheets vs. Relational Databases: Bridging the Gap

For non-programmers, spreadsheets are usually the option of choice when it comes to keeping track of non-trivial amounts of structured data. This is seen in all kinds of settings ranging from the business world to public administration and academic research. Spreadsheets, however, can only capture one kind of data structure: separate tabular views of the data. This is a significant constraint for the user, who arguably thinks of the data, and needs to navigate it, in a more hierarchical manner (e.g. “each student takes a number of courses, each which has a number of TAs”). In the “Hierarchical Spreadsheet” project we tried to extend the spreadsheet paradigm to include some useful features usually found only in the relational database world. Some potentially novel concepts included:

1) Strongly typed worksheets with “advisory” error checking. For instance, the user can designate a particular column to hold numbers only, and maybe proceed to enter a date, but would then see an Excel-style warning dot in the cell in question.

2) Transparent many-to-many or one-to-many relationships between worksheets in a workbook (think foreign key relationships in database-speak). The user can designate a particular column to hold references to rows in another worksheet, or lists of such. The other worksheet will then automatically have a corresponding column added containing references going in the other direction.  (E.g. if each row in the “Departments” worksheet has a column referencing “Courses”, then “Courses” has a column referencing the corresponding rows in the “Departments” worksheet.

3) Hierarchical presentation of relationships between worksheets in the workbook. Columns that reference other worksheets may be configured to show any subset of columns from the referenced worksheet, and so on.

User testing with multiple prototypes showed that the user interface needed to be very similar to that of a traditional spreadsheet (e.g. Excel) to be usable by most users in the target population. Significant features hypothesized to make the interface more efficient (e.g. automatic report layout management) proved only to confuse the users and make it harder to design consistent editing affordances. Nevertheless, we did manage to integrate the key high-level features of the application (relationships between worksheets and the presentation of resulting hierarchical data on screen) into a prototype bearing very much of a resemblance to Excel.

(This project was done by Paul Grogan, Yod Watanaprakornkul, and me.)

Our application includes several novel concepts, including: transparent many-to-many or one-to-many relationships between worksheets (relations) in a workbook, hierarchical presentation of relationships between worksheets in the workbook, and strongly typed worksheets with advisory error checking.

One Response to “Spreadsheets vs. Relational Databases: Bridging the Gap”

  • [...] The Haystack Blog is published by faculty and grad students in the MIT Computer Science and AI Lab (CSAIL)–specifically those in the Haystack group. Principal Investigator (and occasional dance instructor) David Karger is its most prolific blogger–you might have read some of his SIGIR 2009 posts or his debate with Stefano Mazzocchi about how to properly use RDF. But other people’s posts are just as interesting–check out the most recent post by Eirik Bakke about bridging the gap between spreadsheets and relational databases. [...]