SIGIR09: a comparison of query and term suggestion features for interactive searching
Diane Kelley of UNC Chapel Hill presented some interfaces for helping users refine their queries. Lots of situations arise where people need to enter a series of queries to home in on what they are looking for. Literature shows people can exhaust their ideas for good search queries. IR has explored techniques for term or whole-query suggestion.
The problem with term suggestion is that it’s based on common terms at the top, but these may be the unwanted/distracting documents. How do you suggest ways deeper into the corpus? Also, there are problems with terms being presented out of context. There are also basic low level UI annoyances. With query suggestion, there’s the problem of finding a good corpus of queries and figuring which ones are related/similar to the user’s query. Kelly proposes using the automatic term selection techniques as a way to generate whole queries. First extract some terms, then suggest ways of combining them to make a good query. To generate terms, they cluster the documents, took the 5 largest, then selected “good” terms from each of the clusters. The considered offering these terms individually to users, but also just offering a “new query” consisting of old query with top terms appended to the query. They also considered user-generated suggestions.
Diane can always be trusted to carefully work out a good user study protocol so I won’t describe details of the corpora (TREC Robust track, with queries of different levels of difficulty) or user conditions or metrics (”Session-Based Normalized Discounted Cumulative Gain”). That’s all well done but details are in the paper.
As queries became more difficult, users made more queries, and also used more query suggestions. users saw term suggestions as a way to modify their query, but query suggestions as a way to make a whole new query. A but funny, as the query suggestions were in fact modifications of their original query. Qualitative feedback was that people liked the flexibility of term suggestion, and its use to refine the query. They didn’t like the jumbling together of terms, and said it was too much effort to use. People said it was hard to see how terms related to their search. People liked the query suggestions for its “all in one” approach. They liked the specificity and focus of the query—the query made a more meaningful semantic unit than the individual term suggestions. The liked that the queries suggested ways of manually changing their query. In cons, they wished they could click on individual query terms and felt many queries were redundant. In a followup study they found that people used lots of query suggestions. Query suggestions were generally rated higher than terms. Especially, those who got user-generated suggestions preferred the whole queries to having them chopped up into term suggestions; perhaps the user suggestions did not make a lot of sense in individual terms. They’d like to go back an develop a hybrid that lets people get the whole queries but then manipulate pieces of them.
I quite liked this talk but it made me think back to our old work on Scatter Gather. Just like Diane’s, our system clustered the document collection, then picked important terms from each cluster. But instead of presenting “queries”, we just presented each cluster—through its descriptive terms, and also through titles of some “representative documents”. Scatter/Gather offered less flexibility to the user to mix and match terms for a new query; on the other hand, I think there is some interesting difference between the “cluster” metaphor versus the “query” metaphor. I bet that presenting terms in clusters would fix some of the complaints about terms not making sense in isolation.
Followup: this talk was much tweeted and blogged about elsewhere: