Will the Namespace Traffic Jam Kill RDFa in HTML5?
One of the most exciting aspects of the (in-progress) HTML5 specification is the number of data-centric features it contains. It’s almost as if the committee is saying a big, “OK, OK! We heard you!” to all the data-heads out there and is providing not one, not two, not three, but four different ways to access and manage structured data from within the client browser:
- Data Attributes, are key-value pairs that may be added to any DOM node
- Microdata provides a way to interweave objects and object-properties amidst the DOM
- RDFa provides a way to interweave RDF amidst the DOM
- Client-side Database Support provides a full relational data access from JavaScript (the spec says this will be SQL compliant, but in reality it will likely just be the SQLite subset of SQL).
These are all great developments, and will no doubt bring about a lot of creativity about how data can be used on the client-side, but what interests me the most is why the HTML5 working group felt the need to include Microdata alongside RDFa.
The capabilities of HTML5 Microdata and RDFa are nearly identical, albeit with slightly different terminology. Both provide a way to embed data within HTML attributes and tag contents. Both allow for both named entities and blank nodes. And both allow for a variety of more complex constructions, such as lists and HREF property values. One of the only real differences, as I can tell from glancing over the specs, is that RDFa requires URIs whereas Microdata simply uses ordinary strings to reference entities and properties. And that is what worries me: one of the biggest benefits of RDF is its use of URIs, yet URIs seem to be exactly what is preventing the adoption of RDF.
One problem is probably that URIs look funny as data model elements, even to a programmer. “A person has name” is much more natural sounding than “A http://csail.mit.edu/Contact#Person has a http://csail.mit.edu/Contact#name”. We think of our code in natural language terms, and URIs obfuscate our real world metaphors.
Far more serious a problem is the namespace traffic jam that currently exists. If I want to publish an RDF document that describes this blog, for example, best practice would have me draw class types and property types from no less than six ontologies!
- The RDF ontology to describe object properties
- The RDFS ontology to describe object classes and labels
- The Dublin Core (DC) ontology to describe the titles, authors, and the like
- The Friend of a Friend (FOAF) ontology to describe my contact information
- The XSD ontology to describe literal dates, strings, and numbers
- And yet another, custom, ontology to describe everything else particular to the blog
That is already 6 ontologies, and we haven’t even raised the possibility of using OWL Time, Snap, Span, and GeoOWL for things like time and space description! Even for a semantic web developer, the complexity of managing all of these ontologies, and the namespaces that go with them, becomes pretty burdensome pretty quickly.
And that is why I worry about the future of RDFa in HTML5. It appears that the Microdata specification in HTML5 is essentially the RDF graph data model with the URIs neutered out. Given essentially the same data model, no doubt most developers will pick the easier of two formats to implement.
In order to get more people on the RDF bandwagon, we need to make the RDF path just as easy to follow as the Microdata one. How can this be done? If you ask me, the best way is to get rid of this namespace traffic jam and cultivate a set of community-oriented ontologies.
Rather than trying to create base ontologies that address abstract universal concepts, why not try to have each community standardize a single ontology for their particular domain. Have WordPress and Blogger sponsor the Blog Ontology. Have Amazon.com and eBay sponsor the Marketplace Ontology. Have Facebook and MySpace sponsor the Social Ontology. Then, instead of reusing bits from other ontologies, such as dc:creator or foaf:name, have each of these community-focused ontologies be self-sufficient, covering all the concepts necessary for their domain. We can always apply mapping rules to distinguish between social:name and store:book-author-name later. With only a single ontology per domain area to worry about, the namespace traffic jam will disappear and it will be easier for people to get on board with RDF and RDFa.
All in all, it seems the good news coming out of the HTML5 spec is that we can expect rich data annotation to soon be arriving to HTML content everywhere. But what we need to work on as a community is a way to make URIs, and the Ontologies that give them meaning, easier for programmers to use so that the web won’t just be full of data with Microdata, but full of linked data with RDFa.
In creating a specific all-encompassing ontology for each domain, you lose something very important to the Semantic Web: a commonly understood meaning. Is blogging:author the same as marketplace:author? The more commonality there is between the data representations, the more utility can be derived from it. The less commonality there is, the more RDF becomes Flickr tags.
In my mind, the problem you’re referring to is a problem with tools. In developing a webpage with RDFa annotations, one needs a tool that can easily recommend dc:author based on typing ‘author,’ and foaf:name based on typing “name”.
I agree with you that both of these issues are potential problems. Taking a cursory look at the microdata section of the draft, it’s especially frustrating – it seems like a header away from being able to lookup table would be sufficient to take the ‘item’s and turn them into URIs. And the idea of referring to items within a document with vanilla text strings as opposed to #name is frustrating as well. Since the names should be global w/ respect to the DOM of a particular document, it should be possible to generate URIs for the items and values, but it is only practical to link them if they reuse the predicates.
I think Dave makes a good point. Doing away with the commonality of data representation can be problematic. You say that, “We can always apply mapping rules to distinguish between social:name and store:book-author-name later.” However, it is my feeling that the state of semantic web rules makes this a non-trivial task.
Current semantic web rule languages can be limiting in their expressiveness, and existing rule engine implementations can make it difficult to provide the mapping you need. Forward-chaining rule engines may not be appropriate for all cases (especially in the distributed, heterogeneous, massively parallel data environment of the Internet). Alternatively, backward-chaining rule engines, which may be more appropriate for executing these types of rules don’t exist yet. Not to mention, this doesn’t even address the performance issues you can run into when your mapping rules are not reversible.
I also agree with Dave that a big component of the problem seems solvable with tool support. I think that what he’s getting at is there needs to be a better way of discovering and sharing ontologies.
If, on the other hand, that problem isn’t reasonably solvable (because it’s human nature that you can’t get everyone to agree to one thing), then that means we need to step up our investigation of how we can make mapping rules that provide 1) enough expressiveness, 2) scale to the Internet, and 3) have good standard implementations
@Dave — I think you’re right in pointing toward tools. The problem isn’t really with RDF or OWL — the ability to have multiple ontologies is a Good Thing — it is more in the confusion to non sem-web practitioners that comes along with the practice. Better tools would go a long way to fixing this. Serialization format can be thought of as a tool, too.. I wonder if there would be a way to stuff RDFa into HTML with a syntax as easy to manage as Microdata.
As for the “Flickr tags” problem — I suppose I’m making a tradeoff in the suggestion: ease of authorship in exchange for complexity of alignment.
@Andrew — Yeah, it seems like, even with Microdata, you could imply a document-local ontology based on what properties and class names were used (as strings). It would be nice if there was some way to stamp those to a published ontology though.
..
There’s been some research (I forget the link) that shows that auto-suggest input controls cause tag-based systems to reach tag convergence pretty quickly, thus avoiding the Flickr Tag problem. (Think of how your Google searches have probably changed since Google released autosuggest, for example).
What if there was an editing system that allowed programmers to just begin typing a property name, and then it would auto drop-down the list of suggested Ontology-Property pairings based on that local name? That might provide the ease of just “typing in a property name” but then fill out the appropriate namespace information for you.
I think a tool like you suggest would be a very good thing. It could essentially hold the mappings between ontology elements and document-scoped names in memory, and then serialize it properly so that you could get RDFa instead. The trouble is going to come in bootstrapping/configuring it with the right library of ontologies.
That said I think something like that would be the way to go, and it’s a lot easier to see what the next step needs to be if that tool existed as a first step. I think that would be a valuable project to advance the state of the art.
re: “the HTML5 working group felt the need to include Microdata alongside RDFa.” and “we need to make the RDF path just as easy to follow as the Microdata one”
RDFa is not part of HTML5. The RDFa group is still trying to promote it for HTML5, though. Microdata is an RDFa replacement. It’s benefit compared to RDFa is that it makes the RDF path (almost) as easy to follow as the HTML one. And it does indeed support URIs, just no abbreviation mechanism (which is a feature, not a bug).
Regarding the main topic of your post, an alternative approach could be to use the non-URI possibilities of Microdata for domain-specific, namespace-free HTML markup plus the ability to define a grounding namespace for all custom data items on the page level. That would combine easy authoring (no need to worry about vocab mixing) with predictable RDF extraction (via the webmaster-defined target namespace for the custom types and properties). Amazon could use a single namespace on their whole site, bloggers could use a shared blogging namespace, etc.
[...] debate on the dynamics around RDF and one by one of his grad students, Edward Benson, about the differences between Microdata and RDFa in the HTML5 embedded data [...]
Well – remember that the document you are looking at is written in the context of HTML 4. In HTML 4 none of what you say above makes any sense. Attributes are tokens – and the token “xml:lang” is what I was talking about. In HTML 4 those attribute names are case-insensitive – I need to add something about that to the draft. Thanks for the reminder!
[...] Will the Namespace Traffic Jam Kill RDFa in HTML5? (groups.csail.mit.edu) [...]
[...] Will the Namespace Traffic Jam Kill RDFa in HTML5? (groups.csail.mit.edu) [...]
I don’t understand where this “traffic jam” problem comes from. Whose best practice is this? Your best practice?
Look at how many namespaces are used in Tim’s FOAF http://www.w3.org/People/Berners-Lee/card .
I personally think finding all these different vocabularies for describing yourself is a fun thing. Well, for those impatient, we probably need a vocabulary indexer.
If XHTML represented the ordinary practitioners’ target under the W3C ancien regime, HTML5 represents their goal under the new revolutionary junta represented by WHATWG. Tim Berners-Lee is fixated on inducing a global textuality susceptible to predicate logic. Ian Hickson is fighting a different battle, in my opinion. He understands that the rush to replace the text-based web browser with a rich, interactive, hyper-visual, 3d gaming and learning platform, obviates the ever-present danger of web browsers not talking to each other intelligibly. Has anyone studied the semantic representations commonly embedded in Flash clips? HTML5 must win the Plugin War now, and fight the War for Meaning later.
[...] Will the Namespace Traffic Jam Kill RDFa in HTML5? (groups.csail.mit.edu) Sábadosaundz: 3 de julho [...]