Functionally, we can distinguish between two extremes of input into
the system. The first of these we will call the
exploratory or browsing search tools. This type of search
involves the use of a ``probe'' into the information space followed by
an iterative refinement of the search until satisfying information is
found. An example of this type of search is, ``show me douments that
have information on the crime rates in northern California cities.''
The second type of search we will call the explicit search.
Users will perform an explicit search when they are aware of the
existence of some document or object and wish to locate it. An
example of this is, ``find the web page on the MIT site dealing with
patent licensing.'' Between these two extremes of exploratory and
explicit lies the information search spectrum [11]. What
distinguishes one end from the other is the set of hints the user
applies to the query. Users of information systems begin to formulate
better hints either by becoming adept users of the systems or using
the feedback loop from the result sets.
But what exactly distinguishes exploratory from explicit? To understand this we introduce two new terms, confidence and specificity. Imagine an omnitient user. In making a query, the user will be able to cause the system to return a perfect match to the information need (assuming the system provides adequate retrieval mechanisms). We say that this query has high specificity (we get exactly what we want). We can also assume that the user has high confidence that his query will return the right thing. When the user has low confidence and low specificity we can say that they are performing an exploratory search. As the user begines to understand the information space, they are able to pose more specific queries with greater confidence, in essense pin-pointing the document(s) that matches their information need. Unfortunately, queries of higher confidence and specificity tend to be more complex and consist of various hints that a given information system may not be able to understand or utilize. This is important for the system we are attempting to develop here. Naturally, the user of a personal information system will be able to pose queries with higher specificity and confidence than a user of a general information system. We therfore desire an information system that is able to take advantage of the more precise and useful hints a user generates.
Just as we categorized the types of queries, we can do the same for
different I functions. The I function is the
implementation of the information system, and we devise three
categories for it. The first form can be classified under the heading
of associative search tools. For the most part, these
correspond to hypertext systems, and allow for explicit, sometimes
non-trivial,
connections to be made between documents. The second category is the
unstructured or fuzzy search tools. The unstructured search
refers to the questions a user has about the (generally) implicit
meaning of a document. These tools generally correspond to
information retrieval (IR) systems. IR systems were designed to work
well with the operations of indexing and searching among text based
collections. The last category consists of structured or
deterministic search tools. Traditionally, these have been database
systems. Structured search does not necessarily have to do with the
structure of the document, but is usually about querying some data in
a predefined schema. An example of this type of query is, ``return
all documents that are of type e-mail.'' or, ``return all
documents that are dated 9/9/97''
The final component that requires some discussion is the post-processing function, F. Imagine a query, ``find the average grade for student x for this year,'' applied to database of student records. This can be broken down into two tasks: ``find all the grades for student x for this year,'' and, ``average the result set.'' The first task we have already solved through function I. The second task, which involves some post-processing, must now be applied to the result set. This is where the filter function F comes in. There are a variety of filters we may wish to see implemented. We can design F to allow all documents to pass through. Borrowing from linear algebra, we will call this the identity function. Alternatively, we can build some functional summary of the data. We call this function the aggregate. Various database technologies provide this facility as a feature of their system and for certain applications this serves a very practical purpose.
Finally, we categorize the complete system (Q + I + F) as the class of the system. What will become apparent is that different classes of systems will be better suited for different types of problems. The intuition is that different systems are designed and optimized for different problems. Systems with an Information Retrieval core are generally intended for full-text fuzzy matching. Databases are (usually) intended for highly constrained, exact searches. Hypertext systems are (usually) intended for building complex inter-document associations. Because these different systems have different properties, each is well suited towards different tasks. We believe that at different times, users have different information needs and constraints, and will require different types of systems to satisfy those needs. So while one unified solution would be optimal, this may not be possible.
In chapter we will review why current systems that
fit our search model all have useful and negative characteristics. An
ideal solution would augment the variety of information systems with
the features of other information systems. A system composed of
complementary sub-systems, is in our opinion, a much more ideal
solution that satisfies the full spectrum of user needs. We call this
system, the Hybrid-Search.