Query Model System Components

Next: Better living through Hybrid-Search Up: The Problem Previous: A Model for Searches

Query Model System Components

In discussing the system components we can follow the same route as a human would, and consider the query formulation process first (functions Q₁ and Q₂). Abstractly, there are many different kinds of searches users perform for different information needs. These include searches for data for analytical needs, searches for documents that the user created, and searches for documents related to a current research thread. Different information systems are designed to handle different types of queries. We will call the computable query to the information system the input.

Functionally, we can distinguish between two extremes of input into the system. The first of these we will call the exploratory or browsing search tools. This type of search involves the use of a ``probe'' into the information space followed by an iterative refinement of the search until satisfying information is found. An example of this type of search is, ``show me douments that have information on the crime rates in northern California cities.'' The second type of search we will call the explicit search. Users will perform an explicit search when they are aware of the existence of some document or object and wish to locate it. An example of this is, ``find the web page on the MIT site dealing with patent licensing.'' Between these two extremes of exploratory and explicit lies the information search spectrum [11]. What distinguishes one end from the other is the set of hints the user applies to the query. Users of information systems begin to formulate better hints either by becoming adept users of the systems or using the feedback loop from the result sets.

But what exactly distinguishes exploratory from explicit? To understand this we introduce two new terms, confidence and specificity. Imagine an omnitient user. In making a query, the user will be able to cause the system to return a perfect match to the information need (assuming the system provides adequate retrieval mechanisms). We say that this query has high specificity (we get exactly what we want). We can also assume that the user has high confidence that his query will return the right thing. When the user has low confidence and low specificity we can say that they are performing an exploratory search. As the user begines to understand the information space, they are able to pose more specific queries with greater confidence, in essense pin-pointing the document(s) that matches their information need. Unfortunately, queries of higher confidence and specificity tend to be more complex and consist of various hints that a given information system may not be able to understand or utilize. This is important for the system we are attempting to develop here. Naturally, the user of a personal information system will be able to pose queries with higher specificity and confidence than a user of a general information system. We therfore desire an information system that is able to take advantage of the more precise and useful hints a user generates.

Just as we categorized the types of queries, we can do the same for different I functions. The I function is the implementation of the information system, and we devise three categories for it. The first form can be classified under the heading of associative search tools. For the most part, these correspond to hypertext systems, and allow for explicit, sometimes non-trivial, connections to be made between documents. The second category is the unstructured or fuzzy search tools. The unstructured search refers to the questions a user has about the (generally) implicit meaning of a document. These tools generally correspond to information retrieval (IR) systems. IR systems were designed to work well with the operations of indexing and searching among text based collections. The last category consists of structured or deterministic search tools. Traditionally, these have been database systems. Structured search does not necessarily have to do with the structure of the document, but is usually about querying some data in a predefined schema. An example of this type of query is, ``return all documents that are of type e-mail.'' or, ``return all documents that are dated 9/9/97''

The final component that requires some discussion is the post-processing function, F. Imagine a query, ``find the average grade for student x for this year,'' applied to database of student records. This can be broken down into two tasks: ``find all the grades for student x for this year,'' and, ``average the result set.'' The first task we have already solved through function I. The second task, which involves some post-processing, must now be applied to the result set. This is where the filter function F comes in. There are a variety of filters we may wish to see implemented. We can design F to allow all documents to pass through. Borrowing from linear algebra, we will call this the identity function. Alternatively, we can build some functional summary of the data. We call this function the aggregate. Various database technologies provide this facility as a feature of their system and for certain applications this serves a very practical purpose.

Finally, we categorize the complete system (Q + I + F) as the class of the system. What will become apparent is that different classes of systems will be better suited for different types of problems. The intuition is that different systems are designed and optimized for different problems. Systems with an Information Retrieval core are generally intended for full-text fuzzy matching. Databases are (usually) intended for highly constrained, exact searches. Hypertext systems are (usually) intended for building complex inter-document associations. Because these different systems have different properties, each is well suited towards different tasks. We believe that at different times, users have different information needs and constraints, and will require different types of systems to satisfy those needs. So while one unified solution would be optimal, this may not be possible.

In chapter we will review why current systems that fit our search model all have useful and negative characteristics. An ideal solution would augment the variety of information systems with the features of other information systems. A system composed of complementary sub-systems, is in our opinion, a much more ideal solution that satisfies the full spectrum of user needs. We call this system, the Hybrid-Search.

Next: Better living through Hybrid-Search Up: The Problem Previous: A Model for Searches