Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why is RDF so old, complicated, unpopular and still not discarded? (semanticoverflow.com)
33 points by cosmohh on Feb 3, 2011 | hide | past | favorite | 44 comments


As a preamble, when RDF was conceived, databases drove many sites on the web, but their data tended to only be exposed as HTML, instead of a more machine-friendly format.

Now, there are two perspectives on what RDF is.

To an idealist, RDF is the universal data format. There are no semantics baked-in, and you can write arbitrary subject -> predicate -> object triplets to express any possible relationship. To an idealist, it's the perfect format for exposing all the structured data on the web in a machine readable form. The dream has always been for automatic agents to crawl the semantic web for you, understanding the meanings of the RDF triplets, and using them to reason out the solution to your query.

To a pragmatist, that dream has always sounded like a bunch of bull. Absent the presence of strong AI, it's a complete pipe dream that a piece of software will ever be able to infer the "semantic meaning" of interlinked RDF, just because it happens to be defined by triples. At the end of the day, you're going to have a programmer writing rules against specific terms in RDF, and if that's the case, than RDF is nothing more than an extremely awkward API.

Fortunately for the web, the pragmatists won. APIs are everywhere, and RDF is nowhere.

Unless strong AI happens to be right around the corner, the web dodged a real bullet there. Personally, I'm of the opinion that any web agent that could possibly puzzle through RDF triplets should have no problem understanding our APIs, in any case.


Not quite. While RDF certainly isn't all that the idealists claim, you can still get some benefit of it without strong AI.

Mainly, it provides a consistent model for handling the notion of a "field". Non-RDF apis typically return fielded JSON or XML, the structure of which is only specified within the documentation. In order to integrate two services not originally designed to inter-operate, you have to write lots of custom glue code.

RDF is at least amenable to writing generic "rules" to govern field mapping and inference, rather than one-off glue code (which usually ends up being a hacky script). So sure, if you're integrating one service, a hacky script is probably easier. But if you want a coherent system for integrating large numbers of services not originally designed to inter-operate, RDF makes things a lot easier.

So there's some benefit, even if it isn't as dramatic as its proponents claim.

Plus, there's the fact that while strong AI isn't yet on the horizon, RDF is a lot easier for weak AI (inference engines, data mining, etc) to ingest, and weak AI is getting better all the time.

Actually, one of the biggest problems with RDF, to my mind, is that it's structure makes it very difficult to get good performance with truly large numbers of subjects and attributes - and unfortunately that's just the area where it'd be most useful.


My claim is that your "generic rules" to govern mapping fields are actually equivalent to hardcoding the names of JSON fields. Instead of seeing "title", and deciding what to do with the data, you see:

    <!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN"
      "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd">

    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/">

    <dc:title>
... and decide what to do with the data.

In both cases, instead of having a machine understand the (semantic) structure of the data, you have a programmer writing a rule.

Can you provide a concrete example of where this isn't the case, and your RDF-reader is simpler than reading the equivalent (well-designed) JSON of the same data?


It's not just consuming the data though... it's when you go beyond that and start doing inference and combining multiple databases, that the RDF approach really shows it's value.

If you established a standard for doing that kind of field name exposure, using JSON, and then sure, you could achieve the same effect. But, in the end, you'd probably just wind up with a JSON encoding of RDF anyway. Define things as subject/predicate/object is all RDF really is... the RDF/XML encoding is just one way of expressing RDF.


Hey, I really wish that "subject/predicate/object" was all that RDF was, but I'm afraid it's a good deal more:

RDF Syntax: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/

RDF Schema: http://www.w3.org/TR/2004/REC-rdf-schema-20040210/

RDF Semantics: http://www.w3.org/TR/2004/REC-rdf-mt-20040210/

(Those are all current W3C standards)


I agree that a lot of the standards surrounding RDF are ugly. I've always particularly disliked the RDF-as-XML serialization, which took two fairly simple ideas (triples and XML) and combined them into a complex mess. This is why I always hate parsing RSS 1.0. Also, the full generality of OWL just confuses me: It seems to be Prolog done badly.

But just as with XML, it's possible to ignore the cruft (XQuery, XLink, XML Schema, the current SOAP flavor-of-the-month), and just use the useful bits. A similar argument could be made about HTML: For every HTML 5, there's an XHTML 2.0.


Agreed. You can get all the benefits of RDF while eschewing the stupid parts. Just because something has a spec doesn't mean you have to use it.

The full XML spec, for example, is insanely complicated. But people still derive value from it by utilizing a more or less sane subset.


You don't have to use all of that stuff though. "Stuff" layers on progressively to add functionality.


It seems like the missing piece for you is that RDF is one level of abstraction higher than what you're talking about.

If you're just thinking about parsing data out of the above XML snippet, yeah, of course it's more complicated. But the point of RDF is that you don't think about the serialization format (there's libraries for that). You should be thinking about your data at a higher level of abstraction, at the level of "triples" and "inference rules".

You're expected to use a triples database and an inference engine of some kind, either a library or by rolling your own. If you're not, then I agree, you're not deriving any benefit from RDF. But if you are, then it lets you deal with your data in a more abstract, generalized way that does provide legitimate value for certain use cases.


So sure, if you're integrating one service, a hacky script is probably easier. But if you want a coherent system for integrating large numbers of services not originally designed to inter-operate, RDF makes things a lot easier.

Exactly. If one thinks of RDF (and associated technologies) as having an aim of creating a Semantic Web as one big, decentralized, federated database, then you can really see the value in it.

For any one random website to expose some data for people to use, it's a fair argument that RDF is more awkward than just dumping the data out over an HTTP API as generic XML, JSON, CSV or whatever. But when you look at the bigger picture, RDF becomes desirable.


Ah, but RDF in and of itself isn't any more interoperable than XML or JSON. You still have to agree on the vocabulary of your triples. To that end, you bring in RDF Schema, OWL, and their stacks of definitions:

http://www.w3.org/2002/07/owl

(View source on that page). Absent strong AI, defining your terms in terms of URIs to terms, defined by URIs to terms ... ad infinitum, is no more expressive or powerful than simply saying that a "title" is a "title".


Ah, but RDF in and of itself isn't any more interoperable than XML or JSON.

True, but XML and JSON are both fantastically useful technologies despite that lack of interoperability.

So when is RDF actually useful? If I have hierarchical data structures, I strongly prefer JSON. If I have structured documents, I like XML. But if I have a graph, something like RDF n-triples or Turtle is a reasonable way to serialize it.


Let's also not forget the Linked Data idea. You find a node in an RDF graph that you're interested in? Follow the URI that is its id and get more RDF describing the node, and so on. A web of data. We're not there yet, but the number of RDF-ized resources is growing.


Sure, everything you can do using RDF you could conceivably do using a different suite of technologies... but my point - if there is a broad point to make here - is that you'd wind up recreating a lot of the "stuff" that is part of the RDF ecosystem anyway, to achieve the same end.

Now maybe it's possible that someone could start from scratch and build up a system that is both much simpler does the same things... if so, fine, point me to it when it becomes available.


I've also experienced performance issues with RDF stores, but over the last few years, that has increased a lot and I think in another 1-2 years there will be a bunch of stores that are able to handle reasonable large numbers of triples with a good performance.

Aside from that I think, although RDF might not be the 'holy grail', it is in fact quite usable for a lot of problem domains and saying that it needs to be discarded is a bit harsh :)


You can also publish RDF via JSON: http://json-ld.org/


btw, that is also an issue which is addressed by the newly revived RDF working group - http://www.w3.org/2001/sw/wiki/index.php?title=RDF_Core_Work...


RDF is certainly awkward for many use cases, but it's the same API everywhere. Each custom web API needs custom code to use it.

I don't believe in automatic agents, either. But RDF as a universal data format, forming a web of data, can be useful even without agents. It allows linking, combining, loading and querying data from different sources without writing any code at all.

We're not at a point yet where this is often possible due to the lack of (good) RDF data, but the idea is strong. I work on uniprot.org, providing one of the largest free RDF data sets, and we see strong interest from our users---bioinformaticians who often spend most of their time writing import/export scripts instead of doing their actual work.


APIs are everywhere, and RDF is nowhere.

RDF is hardly ubiquitous, but it's also hardly appropriate to say that it's nowhere. RDFa in particular has seen a big surge in adoption over the last year or two, especially after Google and Yahoo announced that they would start utilizing RDFa.

http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-a...


To continue to play devil's advocate here ... if RDF is just an awkward API, RDFa is just an awkward microformat.


More specifically, it is an extensible microformat. Microformats are practically a subset of RDFa with locked-down ontologies (hCard, hCalendar etc).


In theory they are semantically equivalent (usually anyway) so the question is, is RDFa particularly awkward. I guess it depends on how you define awkward.


#2

there are also more and more RDF datasets (http://www4.wiwiss.fu-berlin.de/lodcloud/) published, so at least some people seem to think it is a good format.


Personally, I'm of the opinion that any web agent that could possibly puzzle through RDF triplets should have no problem understanding our APIs, in any case.

…or native human language.


You're saying that a standardized formal model of a graph is equally hard to understand for programs than human language? The people working on natural language processing since decades must be really dumb then.


[I have a client who sells RDF tools. Here's the latest version of what I've been saying to them.]

Let's look at RDF like a startup: The old RDF marketing from, say, 2003 was hopelessly out-of-touch with reality. Users were never going publish their metadata as RDF, and even if they did, you'd need strong AI to use it. Here are two classic articles spelling out why classic RDF wouldn't work:

http://www.well.com/~doctorow/metacrap.htm http://www.shirky.com/writings/semantic_syllogism.html

But things have been looking up in the RDF market lately. The complicated RDF XML serialization is mostly ignored in favor of simple n-triples. Google is making heavy use of RDFa metadata when searching for products, and something like 3.5% of web pages now contain RDFa. The RDF conferences are booming. There are cool projects like dbpedia that are organizing publicly-available information as RDF.

So if the RDF tool vendors are going to succeed, they need to pivot (and many of them are). They need to drop the AI hype, and focus on what their early users are telling them. Some possible sales pitches:

1) RDF is useful as a distributed, schema-free graph database. Competition: Neo4J, other NoSQL databases. There's a couple of very good sales pitches here, including the fact that RDF databases are available from multiple vendors, and that RDF inference can be used to normalize schemas between different data sources.

2) RDF is useful for embedding small amounts of data in web pages. Competition: Microformats.


RDF/XML serialization of RDF graphs can be painful. I completely agree that other serializations like Turtle should be used in recommendations (e.g. R2RML).

But the RDF model is a wonderful thing. The use of URIs as identifiers for objects and properties makes possible for the first time to reuse knowledge and share data, linking APIs in the same way we are alreay linking web pages.

RDF semantics are maybe harder, but most people can start using RDF without caring about things like entailment.

I really think RDF has a future, specially since the steady growth of the LOD initiative. The revision of the standard is also a good opportunity to polish some aspects of RDF, for example, the use of named graphs.


I see a lot of RDF bashing. Particularly from the "Web 2.0" crowd. It has warts, no doubt (is anything borne of the human mind without warts?), but it also has its strengths. A lot of people say it is a failed technology but it's less the failure of the technology and more a failure of the people saying so to properly understand what it can/does do.

Is that a failure of the technology? Because most people don't understand what they would use it for or how they would apply it? I don't think so. I think the claims that it would change the web were high-flown. I also think its creators did a bad job of explaining it. However, you'll find the people that do understand it and have a domain in which it is clearly applicable - love it.

One of my friends works for the library at UCSD and they use RDF, RDFS, and OWL-DL extensively - I couldn't even imagine doing what she does with the library's book ontology using JSON (as some have proposed replace XML and it's vocabularies, even with a JSON "schema" language). I have another friend working for a biotech company - and he uses it there, extensively. These are only two examples and it excludes the other web projects and companies out there that also use it and it's higher level vocabularies/ontologies (UMBEL, etc...).

Is it a failure for the web? (a topic in another thread a few days ago) I don't think it is, I think it is a failure on the part of developers to understand it and apply it (Drupal has applied it).


It was the right idea and almost the right implementation, but almost is a big word -- like a rocket that almost has escape velocity.

There's work being done in this space that is really exciting and I think we'll soon see how much potential the semantic web really has.


A major problem with RDF is that it is almost impossible to build applications on top of it. You'd need another layer of abstraction to handle the complexity.

I really like the approach Freebase takes. It’s a proprietary format, basically. They use JSON rather than XML and they have their own query language MQL (also expressed using JSON). However their graph of entities maps to RDF as well, so they use RDF (along with common ontologies) as an export format. I think while their system is still complex under the hood, it’s less verbose, less scientific, and more user-friendly w.r.t. the public interface. And most important, thanks to MQL it's super easy to build applications on top of it.

---

For me modeling data as a graph (as RDF proposes) is a really great idea! What I always wanted to do is building client applications (single-page web apps) that can operate on a graph of data directly (instead of talking to a REST service). That's why I'm putting efforts in the creation of Data.js, which features a Data.Graph that can be manipulated in JavaScript environments (like the browser or Node.js). Such Data.Graphs can be persisted (synced) at any time. There's support for CouchDB as a backend.

Well, in the README I also pointed out why I decided not to use RDF and instead took inspiration from the Metaweb Object Model (that Freebase uses).

https://github.com/michael/data

I'd enjoy some feedback btw. The lib is actually working, but the examples are out of date. Have a look at the source or ping me if you want to try it out.


Hands up anyone who uses RDF (when they could choose to do otherwise)? Anyone? Bueller?


Plenty of people (including myself) in biomedical fields (especially in bioinformatics, although I know of numerous medically-oriented users) use RDF for a variety of purposes- it's a fabulously flexible way to express and encode complex data models. With a little bit of coordination regarding semantics (OWL, etc.), it's also a great way for people in related but distinct fields to share their data with one another.

That said, RDF has historically had three main problems, IMHO. The first is that its proponents have, historically speaking, done a crappy job of explaining what it actually is, and a worse job of demonstrating what it can do.

The second problem is that it's not really the most human-readable format around, especially in its XML serialization format. This is largely because it wasn't designed for human-readability, but in the age of the Web (where human-readable formats have shown to have major advantages over non-human-readable ones), this is a liability. They've added some more readable serialization formats over the years, but a lot of the documentation and tutorials assume that you'll be working in XML.

When I first tried to learn about RDF, years ago, all I found were tutorials full of really gnarly-looking XML with minimal explanation of what was going on. I spent so much time getting bogged down by the syntax that I missed the point entirely.

The third, and bigger, problem with RDF, is that it's a little hard to "grok" if you don't have a good grounding in predicate logic, description logics, and some of its other theoretical underpinnings. Actually, wait, maybe I said that wrong- I should say instead that most of the old-school RDF people do have solid theoretical backgrounds in description and predicate logic, and have a hard time talking about RDF and other semweb technologies with people who people who don't have that background. So they use lots of jargon that, while useful, isn't very helpful to somebody just trying to get their feet wet.


Have you found any good introductions?


Personally, I found Antoniou & van Harmelen's "A Semantic Web Primer" to be very useful. I felt like it covered a lot of ground at just the right level- enough to explain what was going on and why, but not so much that it got bogged down in pointless detail. However, YMMV- I know some people who didn't care for it.

It's a little bit dated at this point, but Shelley Powers' "Practical RDF" was also helpful- but, again, I got a lot more out of it once I'd read Antoniou and had internalized the whole RDF thing at a "30,000 foot" level.

BTW, I just Googled for the Antoniou book to make sure I had remembered its authors correctly, and it turns out that one of the first results is a PDF version of the entire text. I don't know what it's doing up online, but grab it while it's hot. If it's helpful, Powell's has a couple of used copies:

http://www.powells.com/biblio/1-9780262012423-2


This is a good quick overview of RDF for programmers: http://rdfabout.com/quickintro.xpd


Our CMS user interface is built based on RDFa:

http://bergie.iki.fi/blog/using_rdfa_to_make_a_web_page_edit...

The content repository we use also provides RDF storage and querying:

http://www.midgard-project.org/updates/midgardcr_10-12-hrung...


I don't personally, right now, but I have a friend working for UCSD that uses RDF, RDFS, and OWL-DL quite a bit with their ontology there. I also have a friend in biotech that uses it a lot.

I've used it for past projects (NDA, can't really talk much about it here) in the knowledge representation and inference domain.


I'm working on some stuff that uses RDF, yeah.


Since you're quite an eloquent defender of RDF in this thread -- would you mind sharing a bit of RDF from your project? If it's not public or finished yet ... perhaps pasting an excerpt on gist.github.com?


I don't know that I'm particularly eloquent, but if you say so.

I'm at work at the $DAYJOB right now, so I can't really do anything with this right now. But for what it's worth, one of the areas I'm working with is something called SKOS. http://www.w3.org/2004/02/skos/


We're using it quite gainfully on our current project.


Many people here are answering the question of why RDF sucks, but that was not the question asked. The question is why this suckage has still not managed to bury the technology.

There is a very frequent problem people suffer from, which is mistaking goals for results. When you start looking for it, you'll see it a lot. A new open source NoSQL database will pop up, post a long list of goals ("Fastest performance, maintain some integrity, transactions, trivial sharding, consistent available and partition tolerant, and able to run on a TI-83 at web scale!"), put up a benchmark that shows that if you have no features and have no code written to ensure you don't fall down under real load you can put up way bigger numbers than the products with features, and suddenly you have some set of very excited people. Why are they excited? It's not the code; the code is worthless, the only thing it can do is run that benchmark. It's the promises.

You can see it in graphical programming. Graphical programming has a few modest successes, but the promises are about changing how everybody programs and how even Granny will be able to program. The fact that it has never happened despite immense effort for tons of smart people doesn't stop a certain segment of people from still being True Believers.

And, today, we talk about RDF. It promises to organize the web, it promises glorious wonderful search engines, it promises the world. It can't deliver, because merely sticking URIs on some graph nodes is only the beginning of the solution, not even remotely the end, you still have issues of agreement and accuracy and all kinds of other things. But the promise is so beguiling that some people just can't give it up, if we just try harder it'll happen, it's just that nobody has done it right yet, I'm smart enough to see what the previous hundreds of smart people haven't and I'll get it right, oh, it'll be glorious when everybody gets their heads out of their ass and listen to me and just start doing it right.

But RDF can't get us there. It's so general it's nothing at all.

There are all kinds of places where people become excessively bedazzled by promises and never notice the concrete reality before them. Another interesting example is Object Orientation. This has proved useful, IMHO, if not the be-all end-all of development methodologies, but it is interesting to contrast the promises made by OO back in, say, the late 1980s, with OO reality today. The promises were about how objects can represent things in the real world and you can model the real world with them. This turned out to be bunk. The real world has some place in OO but only carefully layered and wrapped and mixed in with a lot of other not-real-world things, iterators and factoryfactories and facades and data structs and ORMs and so on. The old promises were interesting and wrong, but also so beguiling that even today you will still hear this nonsense spouted about how this is the purpose of OO, even though it is now well understood that writing your programs with an excessively-strong tie to physical reality is asking for problems. Even as the reality is actually useful the old beguiling promises are still around screwing young developers up to this day.

(It is a tricky balance maintaining the proper level of skepticism because conditions change and sometimes wild promises become practical, and sometimes someone really does manage to pull off one of these things. The latest example would be the commercial success of the iPad, because for a long time smart money was on there being no market for tablets after numerous and repeated failures in creating the market. But in general, "show me the code" or appropriate manifestation is still the best way to avoid being trapped in one of these marketing traps, you will miss out on a few hits but pass on dozens of losers.)

(Also, I am aware there are still some True Believers using RDF. My point here is not disproved by a couple people using it, even using it in a big way. My point will only be disproved if someone brings about RDF Utopia, the actual promises. Of course you can bash RDF into submission, but that doesn't prove it was the best solution for your problem.)


Great point. I think another example is XML. It was going to free us from proprietary, binary data formats by making everything nice and understandable pointy brackets. Everyone understands things written in pointy brackets, right?

The reality is that we got OOXML and ODF, which are so enormously complex that no piece of software except Microsoft Word and OpenOffice could hope to fully implement them. If there was an ACID3 test for either it would be fully clear how bad and non-interoperable the situation is. We got XHTML, which turned out to be a horrible idea ( http://diveintomark.org/archives/2004/01/14/thought_experime... ) and is now abandoned. We got XML schema, because everyone realized that when you have structured data you want data types, instead of everything being just text.

And because XML seduces you into thinking it's much simpler than it actually is, people write their own parsers and generators all the time that have no hope of knowing what to do with a CDATA section, and you get parsers that will fail if the whitespace isn't just exactly right. The whole thing is a giant farce.

Just like you say, people got excited about the promise of XML. This is the perennial problem when people try to create standards in an area that doesn't actually have any compelling implementations yet. Good standards refine and codify existing practice. Bad standards try to invent something and standardize it at the same time.


I would agree. XML is like OO, it's actually still useful in some cases, but if you don't have marked-up textual content you're probably doing it wrong. I wonder what fraction of XML in the world is actually marked-up textual content, rather than data masquerading as marked up text?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: