Semantic Web: Stop the format war !!

To me, there’s a lot of confusion out there about what the “Semantic Web” is really about. Recently, I participated in a debate on whether to use JSON or RDF/XML within a project I’m working on. Arguments were rather at the format level (e.g. verbosity of XML/RDF). There was also a member of a developer team with a code-base using JSON who was trying to get an understanding of the benefits they would gain in moving to RDF. One of his arguments was that there was no point in using RDF since we were not linking to external data.

There are different interesting points which came out of this debate which I believe are misunderstandings of what the Semantic Web is about and that I will try discussing in this post.

It’s not a format debate

I guess RDF and OWL also came out to “show” what the Semantic Web could “look” like and help get developers get a feel of it. But that also is what is triggering confusion in thinking the semantic web is nothing more than some special kind of data format. In fact, the Semantic Web is at a higher conceptual level. It should not necessarily change the data formats we use (XML, JSON, CSV, Key-Value pairs) but rather change our way of conceiving a data schema. One of the major aspects of the Semantic Web, which to my opinion receives too limited attention, is having standard vocabularies and reusing elements of those vocabularies whenever possible.

It’s not just about data exchange and inter-linking

In some sense, there are similarities in, on one hand, using standard vocabularies to define new data models and, on the other hand, using design patterns to implement new software. Therefore the job of a “data architect” (i.e. the person in charge of defining a new data model), is to look up existing vocaularies and try to reuse them where ever possible. In both cases, the goal is to not reinvent the wheel and benefit from some standardization. Just recall the leap in productivity gained by moving from raw home-grown data-formats to XML-based (and then later JSON-based) where lots of “classical” error-prone processing has already been done just by using such formats.

It is true that many benefits will mostly be visible “afterwards”

As we have gained many benefits by standardizing at the syntaxical level, i.e. standardizing parsing for XML and JSON, we hopefully can expect to have similar benefits introducing standards at the semantic level. Once you define your data model, and in many cases reusing existing vocabularies, you can expect to have around a plethora of libraries which will help process and reason over your data. Also, having standardized you can expect recuding efforts in inter data-model exchange, limiting future data migration efforts, enabling future functionnality that was not thought of in the first place. To me it is in this last point in which many hopes are being put. Just to give an example, when such standards will be in place, the difference in code to search for “people” in Facebook or in Twitter will be limited to changing an endpoint URL. However, the formal query will be the same (leaving up to you the choice of query langage based on preferences and availability) and the result vocabulary and abstract model will be the same (the serialization format being your choice again).

Also true that its not easy to find vocabularies that fit

While all this seems wonderful, we still need to be realistic. It is true that it can be painful to find and use such vocabularies. First its not always easy to find vocaularies other than the standard Dublin Core and FOAF. Some are incomplete in that they don’t allow expressing completely our needs. At the opposite, others are way too detailed in terms of concepts and relationships. I would tend to push towards simple standard general vocabularies normalizing common concepts.

And there is a true effort in inter-vocaulary mappings

One hope though is that using Semantic Web technology should allow to do reasoning. And a big benefit is that ideally we could define our own data model today completely agnostically of what is going on out side. And indeed, I heard of many cases in which people tend to redefine their own ontology for their domain. And the day we need to integrate, well we just introduce a set of mappings expressed in OWL and we’re done. Well, not really yet. Building mappings is know to be a very tedious task. This is especially true if few efforts we’re put in using standard vocabularies or, at least, common data modelling practices. So to make it a lot easier tomorrow to map such data into another format there is a great incentive in putting a little effort in searching and reusing existing vocabularies. And to me, there rarely are specific domains which cannot reuse from other modeling efforts.

Conclusion

I’ll make it short and simple. Stop debating on the format, use which ever suites you best (and there exists format level mappings anyway ;)). However, put some efforts in looking up standard vocabularies as the day you’ll want to integrate what your doing with something else out there you’ll be happy to have made the mapping effort a lot easier.

Leave a Reply