Next: Hybrid-Search Up: A Tale of Paper Previous: Document State

Straw

Now that we have a basic understanding of the document domain, we can turn again to developing a data model for Haystack. Abstractly, our goal is to map what we have learned the document model into usable data structures.

When discussing the query model previously (see Section ), we found two components that were part of every query, the information need, and hints. Hints were used to narrow down the results produced by the information system to satisfy the information need. In Haystack we would like to be able to handle three distinct types of hints:

Hints based on document content
Hints based on document metadata
Hints based on relationship between documents

These hints naturally map to the three components of document state described previously. Alternatively, they also map to the three information system types: information retrieval, databases, and hypertext, respectively. Table describes the relationships between all the models we have described so far. From left to right, the models become increasingly concrete. The document model represents ideas that will become instantiated in the Straw model we will presently describe. Our motivation is to build an efficient and useful implementation that most accurately maps our abstract models into reality.


Document Model	Straw Model	Information System	In Haystack

Internal state	Term ties	Information Retrieval	Any IR system
External state	Intra-document ties	Database	Lore
Relationships to other documents	Inter-document ties	Hypertext	Lore

Table 1: The relationship between the different models
Table: The relationship between the different models

Now that we have a goal in mind, it is possible to build a data structure representing the document model. We are additionally interested in developing the data structure within an object-oriented environment. We define objects to be instantiation of a particular class. A class is a definition of an object containing both data members and methods. Data members and methods simply correspond to variables and functions. Classes can be extended in a hierarchical fashion, where an extension inherits certain public data members and methods of its ancestors.

The Haystack data model is essentially a graph structure where the nodes (vertices) and ties are first-class objects. First-class implies that an object can be named, exist on its own, and be passed to, and returned by functions. Both ties and nodes extend a specific class. In adherence to our namesake we call this class Straw. Straw will be discussed in great detail in Chapter , but for now regarding Straw objects as nodes will suffice.

Figure: The Straw Model in Haystack

The graph structure in Figure is a possible (but simplistic) representation of Straws. This model shows Document straws connected to a variety of other objects: publisher, title, author, date, and other documents. Between straws there are three types of connections: inter-document ties, intra-document ties, and term ties. Although in the implementation of Haystack we don not make these distinctions explictly, this categorization will help in the present discussion.

Intra-document ties represent a mapping to what we have previously called the external state or metadata of an object. For example, we can connect an author node to the document node by means of an intra-document tie. To partition different documents we will also draw a virtual boundry around clusters of straws connected by intra-document ties and call this a document cluster. In the figure we see one central document node for each document cluster. In Haystack we call this document node a HaystackDocument object. The nodes that have some associated data with them we will call a Needle. For example, we can have an author node/Needle which contains the data ``David'' (with the obvious implication that David is the author). We'll return to both of these objects in Chapter when we describe the implementation of Haystack data model in full.

Inter-document ties represent associations between different document clusters. For example, if one document cites another, we can draw the cites connection between the two. Alternatively, if one document is a web page that links to another, we can use the relation, links to. This type of linking represents the hypertext aspect of the system.

Finally, term ties allow us to connect terms held within the document to the document cluster. That is, if the word ``dog'' appears in a document, we connect a node holding the word dog to to the document by means of a term tie. Note that we are not interested in the word ``dog'' alone, but rather the concept of a ``dog.'' The word in the document may be ``dog'' but the user may remember ``canine'' or ``pooch.'' This implies some semantic understanding of the content of a term straw.

With this model, we are now capable of describing documents in a way that appropriately maps the memory and document models. We propose that it is now possible to both describe objects and to query for them in a powerful manner that is consistent with the way information exists in the ``real'' world.

Hybrid-Search

Next: Hybrid-Search Up: A Tale of Paper Previous: Document State