Database-Backed Web Sites

Beyond using the World Wide Web to present static (hyper-)textual data, the most common use currently is to present different views of data from a database.  Examples include catalog shopping sites such as Amazon, community bulletin boards with threaded user-addable conversational threads, and medical records systems.  In these types of systems, "pages" on a web site are not statically represented as HTML, but are dynamically constructed at the time the user accesses a page.

There are various technologies for implementing this type of dynamic web site.  In this assignment, we will explore one such approach that tries to find a reasonable compromise between generality and simplicity.  Some of the usual approaches:

  1. CGI programs.  From the earliest Web servers, it has been possible to define certain URL's to mean "call the following program, which will return the HTML to display."  This approach is obviously arbitrarily general, but provides little guidance on how to organize the computation.  Often (though not necessarily) the programs are written in scripting languages such as PERL, PHP, PYTHON, etc.
  2. Servlets.  Many web servers now support embedded programming languages as an alternative to calling external CGI programs.  The Apache Tomcat server, for example, allows one to write programs in Java to generate the content of the requested page.
  3. {Active, Java, ...} Server Pages.  Recognizing that often the content of many dynamically generated Web pages varies only in certain parts of the page, there are now a number of hybrid systems that allow the developer to write pages that contain constant content as HTML but also include specially-marked sections that are executable code.  In addition, it is possible to insert computed text into parts of the page.  The common embedded programming languages include Visual Basic, JavaScript, and Java.  We will be exploring this method, using Java as the programming language.  Hence our pages are called Java Server Pages (JSP).
  4. Frameworks.  To reach higher levels of abstraction, various developers and vendors have built different frameworks that simplify and systematize the programming parts of X server pages.  There is a bewildering array of these, ranging from large-scale commercial approaches to standards efforts to research efforts.  For example, the Jakarta Struts project implements technology that makes it relatively easy to implement for Web interfaces a Model-View-Controller style of interaction that is common in building non-Web interactive programs.  In this approach, (to oversimplify greatly), the Model is a data structure representation of the persistent objects being manipulated in an interface (e.g., a shopping cart or a book), the View is its appearance in the interface (e.g., the HTML being displayed to the user), and the Controller, which responds to actions in the user interface to change the model and which responds to changes in the model to alter the interface display (e.g., clicking on "order a book" adds the representation of that book to the shopping cart and the change in the cart causes an update of the display).  The difficulty with many of these is that using them involves a very steep learning curve and a lot of superstructure.  Furthermore, they tend to hide many of the details of the database structure, thus requiring developers to learn an additional level of representation.

Here, we present a homegrown simple framework that tries to achieve a reasonable compromise between the "arbitrary programming in JSP" and the "framework" approaches.  It focuses on making relatively simple the following aspects of dynamic web programming:

  1. Database pooling.  Opening a connection to a database is relatively complex and slow.  Many systems implement database pooling, in which they retain a number of open database connections to re-use when the code on pages needs database access.
  2. Entity modeling.  If an entity is represented by a set of attributes and relationships, these can be passed to a page in a variety of ways:
  3. Form-based display and update of data.

To support these, we have implemented the following Java classes, in the DBWeb package:

(Click here to read the automatically generated javadoc.)

Entity

An object with an arbitrary number of property/value pairs.  Both properties and values are represented by Strings. Instances of Entity can be created from ResultSets from a database retrieval, in which case every column represents a property.  They can also be created from HttpServletRequests, in which case every parameter passed to the HttpServletRequest becomes a property.  Property/value pairs can also be added by the put method and retrieved by get, getS, getH, getQ, and various getDxxx methods.  These retrieve the value corresponding to a property as

get
Whatever the value is, possibly including null
getS
The value as a String, where null is the empty String ""
getH
The value transformed to be safe to include in HTML.  I.e., <, >, &, " become &lt;, &gt;, &amp; and &quot;
getQ
The value transformed to be safe to include in SQL.  This just doubles '.
getDSql
The value should be a date; it is transformed into the form required by SQL: yyyy-mm-dd. (e.g., 2003-04-01)
getDmdy
The value should be a date; it is transformed to the form mm/dd/yyyy (e.g., 4/1/2003)
getDmy
The value should be a date; it is transformed to the form mmm yyyy (e.g., Apr 2003)
getDy
The value should be a date; it is transformed to the form yyyy (e.g., 2003)

Entities also may contain multi-value properties, accessed and set by the following:

getP
The value is an ArrayList of String values.
add
Adds an additional String value to the multiple values of this property.
putP
Makes the given ArrayList of values into the multiple values of this property.

Entities also may contain an ErrRecord, which can hold warnings and errors that can in turn generate HTML data for display.

SqlAccess

An object that encapsulates access to a database backing the web site.  The database is named by a JNDI-style name, which is most commonly defined in the configuration file of the Web server (e.g., in resin.conf for the Resin server).

These objects need to be closed; otherwise, many web serves will, at least under load, exhaust their database pool resources.  Thus, the appropriate way to use a SqlAccess object is like this:

SqlAccess s = null;
try {
    s = new SqlAccess("jdbc/foobar");
    ...
} finally {
    if (s!=null) s.close();
}

The simplest use of a SqlAccess object is just to retrieve a single row or all rows of a relation:

retrieve(sql_statement)
Returns an Entity that corresponds to the first row retrieved by the sql select query.
retrieveAll(sql_statement)
Returns an ArrayList of Entities that correspond to every row retrieved by the sql select query.

More sophisticated uses of SqlAccess require a declarative specification of how an entity is mapped to the relational database.  We describe the language of such descriptions below.  They make it simpler to develop Web pages containing forms that support display, inserting, deleting and updating the corresponding content in the database.

A simple EntityModel is specified as

table:key:field1,field2,...:condition

It assumes that the entities of interest are stored in table, which is indexed by a primary key, and we want to retrieve and store the listed fields.  The condition, if given, must be a legal sql where clause that restricts attention to only those rows of the table that satisfy the condition.  If the table name is immediately followed by "=n", its fields are interpreted to specify (perhaps among other things) a person's name, and on retrieval additional properties fullname and fullFNF are stored in the retrieved Entity(ies).  The FNF version is First-Name-First.  For this to work, we adopt the BibTeX model of names: last, first, middles, vons, and suffix.  If any field name is followed by "=d", then it is considered to store a date, and it is converted to the appropriate format for sql dates before we perform the database insert or update operations.

We will consider more complex EntityModels below.

With an EntityModel, a SqlAccess object supports the following operations:

retrieve(entity_model, id)
Retrieves the Entity whose primary key is id.
insert(entity_model, entity)
Inserts the given Entity into the database.
delete(entity_model, entity)
Deletes the table row whose key is given in the Entity.
update(entity_model, entity)
Updates the fields in a row of the table whose key (and updated values) are given in the Entity.

Form-Based Data Display and Update

One typical way to use database-backed web sites is to allow the user to see a form that corresponds to all the properties of an Entity.  Such a form can then be used to enter new entities, to update the values of an existing Entity, or to delete an existing Entity.  This is implemented in our framework by a jsp page that does the following:

  1. Based on the type of information passed to the page, it first does database manipulation (unless we are using the page to define a new Entity) to insert, update or delete the Entity.
  2. It then generates an HTML form that displays and allows the user to fill in (if new) or modify properties of the Entity.  The action parameter of this form specifies that it is to be processed by exactly the same page that we are on.
  3. That HTML form is created also to contain the appropriate names for the Submit button(s) and, through their values, a specification of what is the appropriate database action to take when the form is submitted.

We provide a framework for supporting this via a SqlAccess object's process method.  The HttpServletRequest or Entity passed to the process method should contain a parameter submit (often from the value of form Submit button(s)), which determines the action taken by process:

null
The request was via an HTTP GET. If a record with the ID specified in the EntityModel exists, we are to update it.  Otherwise, we are setting up to enter a new record.  For example, this may be invoked by following a hyperlink to a URL such as http://foo.org/person.jsp (to define a new person) or http://foo.org/person.jsp?id=317 (to edit the data corresponding to the person whose primary key is 317).
Enter
The data in the request should be the content of a new record to be added according to EntityModel. It is inserted and returned.
Update
The data in the request should be the revised content of an existing record. It is updated and returned.
Delete
The record should be deleted. It is, but the data that were in it are nevertheless returned, in case the user wants to re-enter them.

For Enter, Update or Delete, the values of the Entity's properties are passed to the page as HTTP POST parameters.  In any of these cases, an Entity is returned.  In addition to the properties corresponding to the Entity data, it also contains the following properties:

action-type
The appropriate name of the Submit button that will be in the generated form.
head-msg
An English text message that describes the result of the database operation that has been performed.

The returned Entity also contains errors and warnings that should be displayed if present.

More Complex Entity Models

Object-oriented systems make it possible to represent properties with multiple values (e.g., languages spoken by a person) and to represent 1-1, 1-n, n-1 and n-n relationships (e.g., family relationships, where the child-to-parent relationship is n-1, the parent-child relationship is 1-n, and the sibling relationship is n-n).  In relational models, we often have to do this via use of foreign keys, additional tables that represent the relationship itself, and other such methods.  Because we always view these relationships from the vantage point of one entity, we need not worry about displaying n-n relationships.

The simplest way to display information from a 1-1 or n-1 relationship is just to join it with the table representing the Entity, which simply gives us additional properties from the joined relation.  If the underlying database does not permit updates to views, then the process method of SqlAccess may only be used for certain conventional relationships, not for general update of jointed parameters.

For 1-n and n-n relations, we use Entity's ability to represent multi-valued parameters to represent the relationship.  For example, if one kind of Entity in the database represents a class, it may have a 1-n relationship to people, which represents the faculty member(s) who teach that class.  Visually, we probably wish to show these faculty as multiple possible values of a select element in a form.  The relationship between classes and their professors may be in the teaches table, where each row contains a class identifier and a teacher (person) identifier.  We represent such relationships via a DetailModel, which can be specified in the EntityModel specification in the form:

table: key: field: name: c1=v1,c2=v2,...ck=vk

where table is the table representing the 1-n or n-n relationship, key is the column name in this table that holds the identifier of the base Entity, field is the value of the related entity, and name (if given) is the property under which to store the corresponding values of field.  It defaults to field.  The conditions, if given, should each be valid sql where clauses (without the word "where") to select only the appropriate subset of table entries.

For the current example, such a DetailModel would be specified as

teaches:classid:teacherid

Then, whenever a class is retrieved according to its main EntityModel, the Entity that is filled in for that class will have the additional multi-valued property teacherid, whose values are the identifiers of the teachers for that class.

To make such multi-valued properties easy to use in forms, we also define a Java class HtmlSelector that conveniently generates HTML select statements.  In our example, we might have the following code:

SqlAccess s = new SqlAccess("jdbc/foobar");
EntityModel classM = new EntityModel("classes:id:name,time",
                          new String[]
                              {"teaches:classid:teacher"});
EntityModel peopleM = new EntityModel("people=n:id:"
                                      +"last, first, middles, vons, suffix");
ArrayList ppl = s.retrieveAll(peopleM); 
HttpSelector sel = new HtmlSelector(ppl,"id","fullname");

and in the jsp for the form, we might have something like:

<%= sel.toHtml("teacher", classEntity.getP("teacher"), 5, true, true, "") %>

which says to create and insert an HTML <select> statement that lists the names of all the teachers in people, with those items selected that are paired with the current id/classid in the teaches table.  We ask for the selector to show five items, we specify that the selected items should be shows at the beginning of the list, and that multiple teachers may be selected for the class.

Example

The following example maintains a database of classes and professors, and which professor teach which classes.  The database consists of three tables:

  1. classes has three columns: id, title, and offered.  Id is the class number, title its name, and offered is a short text string that is the null string ('') if the class is not offered, or 'ON' if it is.  This choice is made because of the default values used by HTML check boxes.  Note that no two classes entries may have the same id.
  2. profs has columns for the conventional components of a person's name (last, first, middles, vons and suffixes), a unique id that auto-increments as we define new profs, and a dob that is a date field.
  3. teach has two columns, cid and pid.  Each entry says that the professor identified by pid teaches the class identified by cid.

Here are listings of the .jsp files that allow us to enter or update information about profs and classes.  Note that the teach information is updated through the class form.

  1. prof.jsp
  2. class.jsp

The page index.jsp simply lists all the classes and allows the user to click on one to edit its information.