Ever since returning from the 2009 International Semantic Web Conference last week I’ve been bursting to discuss a panel that took place there on the topic “Does the Semantic Web need Ontologies?”. But the WWW2010 deadline was today and we had 3 papers to write. With that deadline now 10 minutes past, I can finally post! When it was first proposed, I was concerned because panels need controversy to be fun, and I didn’t think there’d be debate on this topic. However, the organizer was confident that he’d be able to arrange different viewpoints on the panel.
When I attended the panel I was sorry to discover that the panelist did in fact all agree. Far worse, they all said “yes” and wanted to debate what kind of ontologies were needed. Those who’ve followed my slow conversation with Stefano Mazzocchi won’t be surprised at my reaction—ajump to the audience microphone to voice a strong “no!” I asserted that a bunch of data presented in spreadsheets was already a big step forward over our current unstructured web. This led to some interesting discussion that helped me clarify some points in my mind that I’ll try to lay out here.
The panelists’ general reaction was amazement that I could be opposed to ontologies. Without ontologies, how could any tool actually use the data? What good would that data be without an explanation of what it meant?
Tim Berners Lee tried to mediate by suggesting that I did support ontologies. After all, a spreadsheet has an ontology: the ontology specifies rows, columns, cells, and the relationship between them. But by this definition, any structured data necessarily has an (implicit) ontology, and saying “ontology” is just another way of saying “structured data”. And I think this diverges from the standard meaning of “ontology” in the Semantic Web community, which I would read as “an explicitly recorded, machine readable description of the ontology of the given data.” While I am a big proponent of structured data I’m going to bet that the panelists would not consider their implicit ontologies to be ontologies in the Semantic Web sense. So we do in fact disagree.
Why then do I think we don’t need (explicit) ontologies? Because I’m focused on the ways that human beings, rather than machine agents, will consume the data being shared. And for humans, a machine-readable explanation of the data’s meaning is often unnecessary because the human who is consuming that data can figure it out in other ways. For example, the meaning of the data elements might be explained in English, a “caption” of the data I am inspecting. Even without captions, if I get a data table with column headings, I can use my comprehension of English to understand the meaning of those headings and from it infer the roles of the columns. Even if there aren’t column headings, the “shape” of the data can tell me a lot—I’ll recognize standard person names, phone numbers, addresses, prices, book titles, and such from the textual patterns or from matches to my large wetware database of known entities. And if I see enough examples I can draw conclusions about the values in the column (indeed, Google Squared suggests that you might not even need a human in the loop to make these inferences).
So humans can understand data without (explicit) ontologies, but is it any use? Sure! Just to plug some of my own group’s tools, they can use Exhibit to throw it into a rich visualization—a map, timeline, or list with faceted browsing and sorting. Or they can combine it with another data set using Potluck, and throw the combined data into an Exhibit visualization. I can make a post on ManyEyes or throw the data into DabbleDB for further processing. These activities typically require me to match certain properties (columns) of the data set into roles in the UI (Exhibit, ManyEyes) or to properties in the other data set (Potluck, DabbleDB)—a straightforward task. They don’t require the machine to understand the data, because I’m the one taking these actions. They do require that the data be structured, since otherwise there’s no way for me to say “which column” to the tools I’m trying to use.
That’s the argument I wanted to make at the panel, but it’s a bit hard to squeeze into 20 seconds at the audience-feedback microphone. So I’m afraid the panelists instead thought that I was arguing against ontologies, asserting that they should not be deployed at all.
On the contrary, I like ontologies. But I’m convinced that ontologies are a luxury, not a necessity. They’re certainly nice to have, and there are some things you can only do if you have them–for example, theycan help me understand column headings written in Russian or Spanish by connecting them to explanations in English. But I remain captivated all the opportunities that arise just by making data easily accessible in raw form. Too often, what people want to do with information is perfectly easy to explain, but impossible to do without serious programming, for silly reasons.
And it’s that enthusiasm for open data that keeps me energetically arguing that we don’t need ontologies. If we need ontologies, then work on freeing data needs to stop until we get them. I think that’s a very dangerous perspective. It’s the one that says “there’s no point to building tools for scientists to publish their data, until we’ve figured out the right huge ontology that we’ll force them all to publish in.”
Instead, I think we should go right ahead with our research on ontologies and tools for them, but in the meantime, let the data fly!
P.S. When someone rose to support me, arguing that we should forget ontologies and concentrate on Linked Open Data, I mudied things further by asserting that we don’t really need the “Linked” part, and Open Data is useful in its own right. While it comes from the same place as my perspective on ontologies above, that’s the substance of my discussion with Stefano, and I won’t repeat it here.