Seeking Users for Two Knight News Challenge Projects

We’ve gotten the news that both our Knight News Challenge proposals have made it through the first round.   Both projects propose to deploy systems that we think will be useful to bloggers of all sorts, but particularly of a journalistic bent.  I’m continuing to seek users interested in alpha testing these tools.  That means you get to run into all the problems nobody’s seen before, but also means that you get to impact our work by telling us which problems bug you the most.

The first project, Datapress, is a wordpress plugin that brings our Exhibit framework into your blog, so you can add rich data visualizations to your blog posts using the same old WYSIWYG editor you’ve been using so far.  I blogged about it here. The second, Tipsy, is a tool for collecting voluntary micro-donations from the people who consume content from your blog.  Blog post here.  Please, follow the links to take a look at the proposals (and comment upon them favorably!).  And if you’re interested in trying out either tool, send me an email.  Please also spread the word to others who might be interested.

Efficient Allocation of Conference Reviewers

I just finished submitting my reviews for WWW and, as has happened several times before, not one of the papers I reviewed had even a remote chance of acceptance.   Nonetheless, each such paper got three reviewers carefully writing down the (same) reasons why it wouldn’t get in.

I wrote in a post last year about why I think three reviewers per paper is overkill, but even if we do put three reviewers per paper on average, is it really best for our field that each paper get exactly three reviews?  Wouldn’t it be better for us to direct more of our reviewing effort towards papers that will actually be published and read by the community?  I propose that a sequential reviewing mechanism would work better.  Let one reviewer take a preliminary read of the paper.  If it is clearly out of scope, or clearly below the bar, they can say so.  Then let the next reviewer take a turn.  They can simply “concur” with the first review or choose to write another.  Ditto for the third reviewer.  Finally, once a paper has been accepted, sic a fourth reviewer on it who can be as harsh as they like reviewing to enhance the paper because they know it won’t affect acceptance.

If I knew that I could cut my effort on weak papers, I’d have more cycles to spend on the boundary papers, where a close look actually matters to outcomes, and on accepted papers, where my time would produce a better paper for the community.  In fact, I’d also be willing to spend more time on the weak papers where I was first reviewer, since I’d know that’s the only review they’re going to get.  In general, knowing whether I was reading a weak, borderline, or strong paper would completely change the way I read it.

Obviously this proposal runs in the face of the tradition of “independent” reviews.  But I just don’t believe that reviewers are so spineless.  I’ll certainly have no problem expressing my opinion just because it’s different from someone else’s.  Keeping reviewer identities anonymous is also a good way to allow that kind of debate.  I’ve also heard concerns about scheduling.  But I think they’re overblown.  We offer incredibly long reviewing timelines in order to give our reviewers flexibility.  If that timeline is segmented, they still have flexibility: a reviewer can choose to review only in a particular segment if that’s all they’ve got free, or to spread their reviews over multiple segments if they’d rather.

I hope some PC takes a stab at experimenting with this.  We’re scientists; we should be willing to test our own hypoteses about what makes a good program committee process.

Submitted Knight News Challenge proposal

As forewarned in my last blog post, I’ve submitted a first draft of our Knight News Challenge proposal at their site; you can read and, more importantly, comment upon the proposal here.  I welcome your feedback.  And as I mentioned last time, if you’re interested in being one of our Guinea pigs, step right up—especially if I can use your name in the challenge proposal!

A Knight News Challenge Application

The Knight News Challenge is an ambitious undertaking to fund innovation in tools that can help digital journalism.  I’m planning to submit a proposal around our Datapress data-blogging plugin for WordPress and am seeking some early-adopter WordPress bloggers who’d be interested in experimenting with our tools on their sites.

I believe that publishing rich interactive data visualizations is a powerful way to get your story across.  If you’re the New York Times you can build your own Visualization Lab to create these sorts of presentations.  But most of us have tighter budgets and less skill.   I’m interested in tools that the rest of us—bloggers, not programmers, and with tight budgets—can use to share information.

We’ve created a WordPress plugin called Datapress that lets you WYSIWYG author—not program—interactive visualizations of any data you like.  You can drop maps, timelines, tables, charts, lists, thumbnail grids, and graphs into your article the same way you drop in an image.   You can include widgets that let your readers sort and filter the data by the criteria you specify.  The data you’re presenting can be in a file uploaded to your blog or can live in a google spreadsheet or a wiki where you can maintain it over time—your article will automatically incorporate your changes.     All these pieces are incorporated in the standard WordPress blog-post editor.

Datapress uses the Exhibit framework, which has been used to create several hundred interesting data visualizations on the web, including some by the San Francisco Chronicle, the Star Tribune, and the St. Petersburg Times.  But with Datapress we’ve tried to make it even easier to author these views and incorporate them in your blogs.  A couple of brave bloggers at Factory Portland and Quantnet have already used it successfully for music and finance.

You can see how datapress works on our demo site, watch a tutorial on the datapress blog, or just download the plugin from the wordpress plugins site.

I’m happy to help anyone try the tool, but if you’re a journalist with a wordpress blog I’d particularly love to sign you up as a Guinea pig for our Knight News Challenge application.  If we’re funded, you get to be the one dictating the additional features you’ll need to make the tool work for you—and we might even listen!

The Semantic Web needs a MySQL

One thing was clear in the comments of many industry-facing participants of ISWC 2010: a big impediment to adoption of semantic web technologies is the lack of an off-the-shelf triplestore that “just works.”

There are many other problems, of course: RDF an awkward format when it comes to real world programming because the graph model doesn’t align to the object-dictionary model of OO programming; JavaScript favors JSON instead of RDF; URIs and namespaces can be a burden to craft the first time around. But these problems can be lessened, or eradicated, with good development frameworks.

Underlying these surface problems is a deployment one: even if a company wanted to, there’s no clear hassle-free solution to getting a triplestore up and running with the same ease, access, and reliability that relational solutions such as MySQL and Postgres provide. And as long as this is the case, otherwise semantic-web savvy individuals are going to continue to live in the relational world. When people are spread thin, and want to focus on user experience instead of database administration, they’ll pick the database product that allows them to focus on other things.

So what gives? Do we wait for a Mike Stonebraker of the triplestore world to come around? Or do we try to bolt our technologies onto non-relational databases with gaining momentum such as MongoDB or CouchDB?

The Toothpaste Problem & Choosing the “right” data to publish

People who visit a toothpaste isle with only 4 products walk away much happier than those who visit the typical supermarket isle crammed with 40 variants of Colgate. Why? Because they don’t get overwhelmed by a tsunami of possibilities that leaves them wondering if they made the wrong choice.

When it comes to a large organization publishing data, perhaps a similar problem arises. Given all the information in the world that we could publish in structured form, how are we to know which important bits to address first?

Hans-Jörg Happel proposed an interesting way to solve this problem in the Social Semantic Web track at ISWC 2010 today. If we can quantify the need for a particular morsel of information, we can prioritize our efforts to structure and publish data. The question, then, becomes how to quantify information need.

Happel’s idea is to do this by examining missing values from query results. When someone performs a query, they’re stating that they need a particular data set. When one of the items in the query result is empty (such as missing 2010 GDP value for Mexico), that’s a known piece of information that someone needed and didn’t get. If we count up the number of times each of these NULL values occurs, we can begin to keep a priority queue of desired, but missing, data.

So if Mexico’s 2010 GDP is missing from WikiPedia, is that a problem? Well, count up the number queries that returned a NULL for this item and judge quantitatively. If the number is comparatively high, maybe we should prioritize the addition of Mexican economic stats.

He’s created a plugin for Semantic MediaWiki, called Semantic Need, which does exactly this. The list of prioritized information is called the “Extended Knowledge Base” — those things that we want to know, but don’t. As a programmer, I find this project very clever. Developers usually think of NULL values in query results as mere annoyances. But this work turns that around and makes them useful.

One of the themes of the Haystack Group is that focusing on user needs can direct research toward results that are immediately useful. On the semantic web, picking an explicit user goal (helping users communicate effectively using data) can be more effective than picking an abstract goal (building a web of linked data). Our project DataPress attempts to follow this philosophy by helping users add interesting visualizations to their blogs, and as a side effect, showing those users the value of structuring their data. Semantic Need follows this philosophy in another way: it attempts to quantify an existing, realized need for pieces of data so that we know which data is actually useful for structuring right now.

While the presentaiton didn’t address it, the idea behind this talk could be incredibly useful for government data. What if governments provided not links to data sets (as data.gov does) but rather some ontology and a query interface. Then it sits back and sees what users query for. Using an approach like this, the “what data should we publish” problem solves itself: the queries people ask will tell you what data to prioritize for publishing.

Here’s a link to the paper: Semantic Need: Guiding metadata annotations by questions people #ASK

Why All Your Data Should Live in One Application

A couple of days ago Adam Pash at Lifehacker posted a criticism of “everything buckets”—applications aimed at gathering every kind of information you work with into a single place.   I can’t resist responding as the article touches on some of the issues that have framed my past 15 years of research into information management.  It gives me a chance to talk about what’s wrong with today’s application model and about how to create a truly effective everything bucket.

Adam was initially excited to use Evernote as a “universal capture tool” but has since become disenchanted.  He builds on a presentation on perfecting digital filing systems by lifehacker founder Gina Trapani and an even earlier anti-everything-bucket post by Alex Payne.  A few quotes from that post summarize Alex’s (and Adam’s) take on everything buckets, though I encourage you to read the entire post:

An Everything Bucket, since you’re probably wondering, is what I call applications that encourage the user to throw anything and everything into them. They’re virtual scrapbooks, applying a lightweight organization system to (often) unrelated data of varying types.

Computers work best with structured data. Everything Buckets discourage the use of structured data by providing a convenient place to commingle “structureless” data like RTF and PDF documents. Rather than forcing the user to figure out the rhyme and reason of their data (for example, by putting receipts in a financial management application and addresses in an address book), Everything Buckets cry: “throw it all in here! Search it!

This proposition should not sound great. If you think you’re going to save time in the long run by throwing your data into a big bucket now, then sifting through it later, you are mistaken. There are better ways.

Adam and Alex think everything buckets reflect a Faustian bargain: for the sake of short-term convenience, you give up on the data structuring that makes your applications useful information managers.

Below, I respond to this position, arguing that

  1. Taking the Faustian bargain is often rational, because
  2. Using current structured-data apps is just accepting a different Faustian bargain, and
  3. There is a way to escape these bargains and create best-of-all-worlds information management tools

Current Apps

As Alex points out, “when you need to store some data, there are so many wonderful applications to pick from.”  So you have to wonder at the perversity of people who not only avoid them, but but don’t even put their information into a computer at all!  We did, so my two students Michael Bernstein and Max Van Kleek, along with me and my frequent collaborator mc schraefel, carried out an extensive interview-based study to determine what drives people to put information—sometimes copied out of the computer—on pads on their desk, sticky notes attached to their monitor, scraps of paper in their wallet, paper calendars, or the backs of their hands.  The results were presented in ACM TOIS.  We found several recurring themes.

The first is quick capture.  Adam and Alex highlight the benefits of retrieving your information from a structured repository, but ignore the cost of putting it there.  Launching an application, navigating through its screens and menus, and thinking about where in the organizational structure to put your new information, or even worse about how to modify the organization so your new information fits, is a significant cost that might easily outweight the benefits of structure at retrieval time.  Indeed, recent analysis of use of our list.it “everything bucket” suggests that much of the information people file is never retrieved—thus, the benefit of any structured organization must be discounted by the likelihood of retrieval, which may drive it below the cost of careful filing.

The second is the rigidity.  If you adopt a particular application, you adopt its schema.  You can store only the data that application is prepared to store, and only in the form that application is prepared to store it.  You can only look at it the way that application wants to show it to you.   If I use Thunderbird’s address book, I can store addresses and phone numbers, but where do I put the dietary preferences of the contacts I invite to dinner?  I can store a contact’s birthday, but where do I put their anniversary?  I use winamp for my music collection, but winamp thinks I’m managing plain old songs, when in fact I’ve got music that I play at the folk dance session I run every week.  Where do I put (and use) information about dance choreographer, tempo,  style, difficulty, and date of teaching?   I can always use the (pervasive in many apps) “custom” or “miscellaneous” fields, but these are often limited and number and do not offer the same organizatoin or visualization benefits offered by built-in fields.

A closely related problem is fragmentation.  Applications exist to gather up and relate different pieces of information.  But they only gather what’s in their own purview; linkages between different applications are extremely sparse.  I’ve written about this at length in a CACM article (copy here) later expanded to a book chapter (copy here).  This means that to do one task, you often have to open up several different applications, searching in one for information I’ve already found in another, or even retyping it (since the application schemas don’t match, you can’t copy and paste). I know many of the choreographers shoehorned into my music application, but if I’m looking at the song and want to ask them a question, I have to go search for their entry in the address book.

In summary, while unstructured note tools may demand a Faustian bargain to give up on effective retrieval, structured tools often impose a different Faustian bargain around your ability to record and use information the way you want.

Thus, depending on the balance of costs and benefits, I believe that there’s a great role for everything buckets in the short term.  We built our own, list.it, based on the insights we gained from our user interviews.  It’s a lightweight firefox extension with about 16,000 users who’ve taken about 120,000 notes and, judging by the mozilla reviews, seem very enthusiastic about the quick capture and all-in-one-place aspects of the tool.  We published a paper in CHI 2009 that studied initial usage and supported the arguments I made above.

Future Apps

Looking further ahead, I agree with Alex and Adam in the dominant role of structured data.  But to get there, we need a way around these Faustian bargains.  I believe there is an approach that can satisfy our need for structure (to help management) and our need for flexibility and data linking.   The solution is to build structured data applications where the users themselves define and adapt the structure to meet their own needs.   We’ve built a series of tools aimed at exploring this vision.  The first was the Haystack desktop application, a structured everything-bucket.  Eytan Adar implemented the first version in perl (CIKM 1999), while Dennis Quan created the follow-on Java implementation (ISWC 2003).  Instead of the textual data models used in today’s everything buckets, Haystack started with a universal structured data model, holding objects of arbitrary types with arbitrary properties and connected to other objects by arbitrary relationships.  On top of this generic data model, Haystack provided user-configurable views of subsets of the data, as well as user-configurable operations that could be applied to those items.

The Haystack Client

The Haystack Email Client

The Haystack Brain Client

With Haystack’s framework, you could build a reasonable mail-reading application over a set of types such as people and email messages, but you could equally easily (using an “application editor” created by Karun Bakshi) build an application for editing a neuroscience research paper, managing your brain data, relevant publications, and your coauthors.  Because the underlying data was integrated, the on-screen view of your coauthors provided access both to their bibliographies and to their address book entries so you could email them.

I’m still a believer in the Haystack vision, but in practice we found it difficult to convince people to abandon their long-cherished pim tools in favor of a half-baked research tool.  Products like Filemaker and Bento have found a niche but still feel more like database tools than applicatoins.   David Huynh wisely recognized that the Web might offer an easier environment for deploying new tools, and developed the Exhibit framework.

Presidents

Exhibit of U.S. Presidents

Exhibit lets you author (not program) the kind of rich interactive data visualizations you can create with structured data, without pushing you through all the hassle of installing, programming, or operating a database, a templating web server, or an Ajax-y Javascript code.   You just publish a (structured) data file, such as a spreadsheet, and stick a few special tags in your HTML document to describe how you want it to be displayed—as a map, a timeline, a scatterplot, a list, a table—with what kinds of interactive sorting and filtering.  Then you just link to the Exhibit javascript, which takes care of making everything happen in the client browser.   The data can follow whatever schema you choose, and you’re also in charge of the look and feel of the visualization.  People have used it to publish information about restaurants,court cases, pollution, chemical compounds, political scandals, disease outbreaks, bridge safety, classical music, linguistics, legal databases, publications, and much else (you can see others on the project web site and here).  Lifehacker Gina Trapani mentioned above created an exhibit of all the Broadway shows she’s seen.

Exhibit is a tool for publishing data, but we’ve recently looked at repurposing it into a tool for managing data.  With Dido, your exhibit becomes an in-place editable structured-data document.   You get all the rich visualization and interaction, but you can WYSIWYG edit the data you’re looking at (as well as its visualization), then save the document to persist your changes.  Like Exhibit, Dido leaves it entirely up to you to decide what kind of data you want to manage and how it should look.

Exhibit and Dido solve the rigidity problem but don’t really address fragmentation.  We’re starting to see sites like Freebase that offer to become everything-buckets in the cloud, unifying every imaginable information entity into a single richly structured data model.  This is great for data of public interest, but we’re going to need a personal version to store our own esoteric data.  And individuals need to create their own flexible visualizations of appropriate slices of their data for specific tasks.

Ultimately, there’s even hope of combining these affordances with quick capture: natural language processing and other machine learning techniques can be used to take information that users jot down quickly with tools like list.it and infer the implicit structured meaning needed to incorporate that information into a structured repository.  We’ve already seen baby examples of this in tools like Google’s quick-add feature for their calendar.

Conclusion

The everything-bucket is here to stay.  Today’s version provides the quick capture that is often the most important feature of an information management tool—in its absence, information may not get recorded in the computer at all.  Looking ahead, a structured everything bucket is the right way to cope with information fragmentation, letting you link together all the different kinds of information you need to tackle different tasks, instead replicating or partitioning it among different applications.  To effectively use that information, people will be able to author their own task-specific information visualizations that draw appropriate slices from the everything-store.

The Exact Average UIST Paper

Seeing the Future by Mining the Past: The Exact Average UIST Paper
by Bernstein, M.S., with the largest contributions from eigenauthors Hudson, S., Balakrishnan, R., Myers. B.A., Feiner, S., Hinckley, K., Rekimoto, J., et al.

History repeats itself: computer science research continuously reinvents past work. Thus, to predict the future of UIST, we trained an n-gram language model on twenty years of previous UIST abstracts. We thus present the machine-generated exact average UIST paper, with small edits made for clarity and maximum humor:

Edgewrite is a big problem in current handheld browsing: we describe some common problems experienced by users that are hit. We describe the architecture we have implemented and deduce a general framework that provides on-demand, persistent prototype applications. We discuss how people-tagging is a low-cost off-the-shelf electroencephalograph (EEG) system. We hope to challenge the audience to creatively consider ways that would otherwise result in small thumbnails that are placed away from systematic noise sources which can identify 100% of dishwasher usage. Unlike previous work, we present examples that represent the structure of toolkits. The user glances at the possibilities of olfactory output devices: ubiquitous computing environments need to work together. The result in a user study shows that after the stimulus, the location of the user responds to an automatic graph layout system. Finally, we introduce three new interaction techniques.

—–

I hope that someday I can give the exact average UIST talk.

Is it better to be messy or neat? An etiology of messiness

When someone describes a person as a “messy” or “tidy”, we can instantly guess something about their appearance, their personality, and the way they organize their physical and digital artifacts – around the house, office, or on their computer(s). There is little disagreement around these definitions, and many stereotypes (both positive and negative) are commonly associated with each. Messiness is often associated with artistic, creative and scientific or mathematical genius, spontaneity, but also with carelessness, eccentricity, madness and unreliability. Neatness is associated with preparedness, confidence, attention to self-presentation, efficiency, and stability, but also with hierarchy, rigidity and mundanity [1].

Famous examples of each personality type abound. In the messy camp, Albert Einstein, Francis Bacon, Alexander Fleming, and Alan Turing were all notoriously messy geniuses whose unkempt appearances matched their chaotic laboratories and studios. Meanwhile, David Beckham, the English footballer, and Martha Stewart, both notorious neat freaks, could stand for the personification of Confidence and Pristine Order respectively.

But from where are such tendencies derived? Is it evolutionary, cognitive, or occupational? That is, are people born tidier or messier than others, do they grow into it as they develop, or do they adapt based on the demands of their roles, jobs or environments? What drives people to be messy or tidy? Is there scientific basis showing that it is better to be one or the other? Why does “messiness” or “tidiness” in one aspect of our lives tend to correlate strongly with messiness or tidiness in others ?

More relevant to the discussion of Personal Information Management tools, what does an individual’s personal organizational messiness shape the way they keep and manage information? More importantly, what does it imply towards the design of new tools that could better support people’s needs? Should digital PIM tools be made to facilitate the unique needs of tidy and messy people, or should they try to shape people into one or the other?

Why are people messy?

The simplest theory is that messy individuals perceive the cost of tidying to be less than the potential benefits. (We all probably know one or two computer scientists that use this as an excuse!) If only the benefit gained in facilitating later retrieval is meaured, this is true in many cases; a quantitative analysis of e-mail use has shown that, constantly foldering e-mail is a waste of effort due to the infrequency with these messages are actually ever later needed (compounded by the availability of search tools, which we discuss later). But tidying has other additional benefits mentioned below.

A second theory is more attentional: that messy people are perpetually distracted with things that are “more interesting” or important than tidying up. This is a subconscious choice that happens instinctively, as things grab their attention: “as soon as we have finished with the coffee cup, it is invisible to us. We simply don’t see it. It’s like that stage of a baby’s development at which, if something leaves its grasp, it ceases to exist” [Abrahamson]

A third theory points to innate cognitive reasons: tidy people might think more hierarchically or categorically than messy people. From “The Etiology of Messes” (Abrahamson, 2002):

Simon (1962) raises the possibility that hierarchically-structured systems only appear to be ubiquitous because the human cognitive apparatus is itself a hierarchical-ordered categorization scheme (Rosch, 1978). Such hierarchical categorization schemes allow us to parse out, encode, reason with, and remember hierarchical-ordered information, causing us to perceive hierarchies as ubiquitous in natural, social, and symbolic systems. These schemes might also obscure alternate, non- hierarchical forms of order, revealing them as the deviations from order, which can to them be perceived of as messes. [1]

Thus to those who are hierarchical thinkers, anything other than a hierarchy might be perceived a mess, and those that are messy are merely non-hierarchical (e.g., relational) thinkers.

Why are people tidy?

For tidy people, however, the aesthetic or emotional value of maintaining their system makes it worthwhile to dedicate time and attention towards maintaining it. Those who appreciate order and neatness see it as an essential part of dealing with the complexities of the world; as summarized by Martha Stewart, “Life is too complicated not to be orderly.”

There are well-known advantages to order and organization. Hierarchy has been used in ages to combat complexity and build up resilience to failure, disruptions and interruptions. Complex human-designed systems, from organizations to computer software programs are organized hierarchically so that the roles and relations among parts are clear, and communication is restricted to those with whom contact matters most. Hierarchical organization is easy because a linear number of steps is needed to pinpoint the location within a comparatively exponential number of items.

Beyond making it easier to get at things by reducing the entropy of collections, the act of tidying serves a number of important purposes for those that are neat-inclined, including reducing stress, producing a feeling of “in-controlness”, making one feel more situationally-aware. In addition, the process of tidying itself involves processes often referred to as external cognition, which can bring insight and understanding — categorical formation, refinement. prioritization and sense-making. The mere act of serendipitous (re-)discovery of an item or information during tidying may help one remember or realizing something important that would have otherwise been forgotten.

What does this mean for Personal Information Management?

Studies in the field of Personal Information Management over the past two decades have repeatedly revealed tidy vs messy differences in the ways that different people organize their information digitally and on paper, including how paper is organized in people’s offices, e-mail, digital calendars, to-do lists, and computer filesystems. [3] [4] Despite these insights, a long question which has not been answered is how to support such differences in personality — how and whether tools should be made to support messy individuals in their piling practices or to encourage them to be more tidy, for example, or to help tidy individuals, who naturally are already organized, to keep organized. These studies have revealed, however, that the reality is that most people are somewhere between the two extremes; either messy in some ways and neat in others, or alternating between messy and neat over time through periodic tidying. This, unfortunately makes the problem of PIM tool design even more difficult as it becomes less clear how and when to support these tendencies.

One way forward is to look beyond these tendencies and instead at the ways these strategies impact how the individual creates, manipulates, and manages their information. For example, messy individuals deal with new items by, not dealing with them at all (i.e., leaving things where they land),. This suggests that tools might support users who wish to spend little or no time or effort in dealing with new things, such as through low-cost or instantaneous “capture” of new information without categoriation. Tidy individuals, meanwhile, try to organize information as it arrives, but get sometimes overwhelmed (whch causes them to “sweep up” afterwards). This might suggests that tools might proactively support such “queueing up” and prioritization of backlogged items so that tidy individuals can gracefully defer and deal with bursts with ease. Examining the structures that tidy people use to organize (flat, hierarchical or otherwise) should lend significant insight towards what sort of categorization/organization these systems should support.

We will examine our experiments with supporting such needs in a later post.

[1] Abrahamson, E. ”Disorganization Theory and Disorganizational Behavior:Towards an Etiology of Messes”, Research in Organizational Behavior, Volume 24, 2002, Pages 139-180, ISSN 0191-3085, DOI: 10.1016/S0191-3085(02)24005-8.

[2] Barrowclife, M. “Messy? I’m an artist!”, The Times Online, Jan 29, 2009.

[3] Malone, T. W. 1983. How do people organize their desks?: Implications for the design of office information systems. ACM Trans. Inf. Syst. 1, 1 (Jan. 1983), 99-112. DOI= http://doi.acm.org/10.1145/357423.357430

[4] Whittaker, S. and Hirschberg, J. 2001. The character, value, and management of personal paper archives. ACM Trans. Comput.-Hum. Interact. 8, 2 (Jun. 2001), 150-170. DOI= http://doi.acm.org/10.1145/376929.376932

Rich Visualizations in your WordPress Blog

Datapress is a WordPress plugin that makes it easy to enhance your WordPress blog posts with rich interactive visualizations such as maps, timelines, various charts, and sortable lists, all with interactive filtering and faceted browsing.   Datapress uses the Exhibit framework to offer a collection of rich structured data visualization elements that can be dropped into a blog post the same way as an image or a table; once they’re inserted they connect to and display the data (spreadsheet or other tabular data) that you upload or link to in the blog post.

Our plugin has finally been posted at the WordPress plugin site, so you can go there to download it.  If you use it, feel free to post your reviews, ideas, kvetches there (or here). If you haven’t been using it, give it a try—a few sites have already done so.   Beside our own demo site, some examples that might inspire you include:

My students Ted Benson and Adam Marcus have done a great job integrating our new visualization elements into the standard wordpress post editor, so if you’re comfortable writing blog posts, you’ll have no trouble incorporating the new elements.

Datapress reflects my belief, articulated in previous blog posts, that the biggest obstacle to effective use of structured data on the web is the lack of good authoring tools.  I’m hoping you’ll try the tool and prove me right, or fill the comments with some nice arguments about why I’m wrong.