I’m at the 2009 Conference on Innovative Database Research, an interesting workshop that tries to fit its name by taking work that’s a little too “innovative” for the regular database conferences.
The opening plenary talk is by Jefffrey Heer (built the flare UI toolkit, now a professor at Stanford) on how data analysis can be a social/collaborative activity. He started by outlining some work he did with dana boyd, on Vizster. Back when friendster was the first/only social networking site, they built a tool, Vizster, for visualizing the friend graph. It offered a typical force-directed graph layout of your friends, but added representation of various attributes to support richer data analysis, clustering, etc. interested in how people would make use of it. One of the first places they demo’ed was at brought it to a kiosk at a big blogger party. Unexpected result: hundreds of people used it, but groups spent much more time with it than individuals. Friends would encourage each either to find relationships, probe boundaries, engage in data verification.
Then, they ran into Martin Wattenberg’s baby name explorer, which was a big hit on the internet—it offered graphical visualization of the history of baby name usage over time.
So Jeff got together with Martin and made sense.us, supporting social analysis of 100 years of census data. It offered various hand built visualizations and let users pick the data sets and parameters, and also add commentary on the visualizations. THe identified two kinds of users.
- Voyager: focused on data, chasing down hypothesis, serendipitous comment discovery. Eventually tired of it.
- Voyeur: focus on comment listings. Investigate others’ explorations. Find people and topics of interest, which got them excited and turned them into voyagers.
Lots of services have sprung up. Hans Rosling’sTed talks has raised awareness of vizualization as a storytelling medium. There’sgoogle maps. Spotfire allow publishing visualizations to the web. Similarly tableaux software, Swivel (which lets people upload their own tabular data sets) and ManyEyes which was a followup to sense.us.
Jeff has come to talke about what the visualizers want from the database community.
Discussion and debate. People see things in data, comment on it, propose hypothesees, others come in and offer parameter changes to the visualizations to support/disprove the hypothesees. Debate about data can actually force data authors to go back and fix their data. They did an analysis of the comments on the data visualizations, and labelled various roles. Most comments were observations about the data, but lots of questions and hypotheses as well. A virtuous cycle. But right now the comments are just streams of text. Question: how can they better structure these conversations, recuce the cost of synthesizing contributions? What structure can be given to conversation to help aggregate and summarize? Can the data sets being built, and the visualizations, and the social activity be represented in a unified data model? (We need to talk to him about RDF!)
Text is data too. Univariate data is the most popular. For numbers, bar graphs are the winner. By far the most popular visualization at many eyes is the Tag Cloud. Together with word trees, they form 1/3 of the manyeyes visualizations. They want better tools for dealing with text—both analysis tools and visualization tools. Tools that engage users in entity extraction., or aggregate and compare texts. Right now on many eyes people take documents, aggregate them in their editors, then post to manyeyes. Can we do better? Can the tools turn around and analyze the actions of the community?
Data integrity and cleaning: When someone creates a visualiztion, it can make data errors very obvious, and lead to them being fixed. Ney York Times’ visualization lab uses maneyes to build their visualizations. Data integrity was 16% of all comments on sense.us. We need better data cleaning tools. How can people do “in situ” data cleaning that engages users in helping the data get better _after_ it has been uploaded to the visualization?
Data integration in context: people start hand annotating data. They are laying extra info on the timeline of data. Others will bring in additional data sets to enrich existing data. A game formed around a visualization of harry potter library holdings, as individuals started annotating it with which books they’d read.
Stanford Vispedia. Utah Vistrails. Google Web tables.
Problems of pointing and naming. Eg a “look at that spike” comment that needs to be associated with some element of the visualization—named reference, reference by position, by shape, etc. General dyksis problem. So far tools mainly annotate the pixels; how do you instead annotate the underlying data? Model selections as declarative queries over the data that can be stored (and still work when the data changes and even as the visualization changes). Can there be meta queries linking annotations to views, or ways to annotate data aggregates? Again he wants a unified model to facilitate references. Can the annotations be made machine readable?
A lot of people create visualizations and then host them elsewhere for discussion. They’ve often seeing people taking data from different sources, taking stuff out, computing elsewhere, and then reuploading the analysis results.