![]() |
What is RDF and what is it good for?
Why we need a new standard for the Semantic WebOn the Semantic Web, computers do the browsing for us. The “SemWeb” enables computers to seek out knowledge distributed throughout the Web, mesh it, and then take action based on it. To use an analogy, the current Web is a decentralized platform for distributed presentations while the SemWeb is a decentralized platform for distributed knowledge. RDF is the W3C standard for encoding knowledge. There of course is knowledge on the current Web, but it's off limits to computers. Consider a Wikipedia page, which might convey a lot of information to the human reader, but to the computer displaying the page all it sees is presentation markup. To the extent that computers make sense of HTML, images, Flash, etc., it's almost always for the purpose of creating a presentation for the end-user. The real content, the knowledge the files are conveying to the human, is opaque to the computer. What is meant by “semantic” in the Semantic Web is not that computers are going to understand the meaning of anything, but that the logical pieces of meaning can be mechanically manipulated by a machine to useful ends. So now imagine a new Web where the real content can be manipulated by computers. For now, picture it as a web of databases. One “semantic” website publishes a database about a product line, with products and descriptions, while another publishes a database of product reviews. A third site for a retailer publishes a database of products in stock. What standards would make it easier to write an application to mesh distributed databases together, so that a computer could use the three data sources together to help an end-user make better purchasing decisions? There's nothing stopping anyone from writing a program now to do those sorts of things, in just the same way that nothing stopped anyone from exchanging data before we had XML. But standards facilitate building applications, especially in a decentralized system. Here are some of the things we would want a standard about distributed knowledge to consider: 1. Files on the Semantic Web need to be able to express information flexibly. Life can't be neatly packed into tables, as in relational databases, or hierarchies, as in XML. The information about movies and TV shows contained in the graph below is really best expressed as a graph: ![]() Knowledge as a Graph Of course, we can't be drawing our way through the Semantic Web, so instead we will need a tabular notation for these graphs. Compare the table below to the figure above. Each row represents an arrow (an “edge”) in the figure. The first column has the name of the “node” at the start of the edge. The second column has the label of the edge itself (the kind of edge). The third column has the name of the node at the end of the arrow.
Whether we represent the graph as a picture or in a table, we're talking about the same thing. Both describe what is abstractly called a graph. More on this later. 2. Files on the Semantic Web need to be able to relate to each other. A file about product prices posted by a vendor and a file with product reviews posted independently by a consumer need to have a way of indicating that they are talking about the same products. Just using product names isn't enough. Two products might exist in the world both called “The Super Duper 3000,” and we want to eliminate ambiguity from the SemWeb so that computers can process the information with certainty. The SemWeb needs globally unique identifiers that can be assigned in a decentralized way. 3. We will use vocabularies for making assertions about things, but these vocabularies must be able to be mixed together. A vocabulary about TV shows developed by TV aficionados and a vocabulary about movies independently developed by movie connoisseurs must be able to be used together in the same file, to talk about the same things, for instance to assert that an actor has appeared in both TV shows and movies. These are some of the requirements that RDF, Resource Description Framework, provides a standard for, as we'll see in the next section. Before getting too abstract, here are actual RDF examples of the information from the graph above, first in the Notation 3 format, which closely follows the tabular encoding of the underlying graph: Notation 3 Example @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ex: <http://www.example.org/> . ex:vincent_donofrio ex:starred_in ex:law_and_order_ci . ex:law_and_order_ci rdf:type ex:tv_show . ex:the_thirteenth_floor ex:similar_plot_as ex:the_matrix . And in the standard RDF/XML format, which may have a more intuitive feel and is more explicit about hierarchical structure in the graph, but in most cases it tends to obscure the underlying graph: RDF/XML Example
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ex="http://www.example.org/">
<rdf:Description rdf:about="http://www.example.org/vincent_donofrio">
<ex:starred_in>
<ex:tv_show rdf:about="http://www.example.org/law_and_order_ci" />
</ex:starred_in>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.org/the_thirteenth_floor">
<ex:similar_plot_as rdf:resource="http://www.example.org/the_matrix" />
</rdf:Description>
</rdf:RDF>RDF was originally created in 1999 as a standard on top of XML for encoding metadata — literally, data about data. Metadata is of course things like who authored a Web page, what date a blog entry was published, etc., information that is in some sense secondary to some other content already on the regular Web. Since then, and perhaps even after the updated RDF spec in 2004, the scope of RDF has really evolved into something greater. The most exiting uses of RDF aren't in encoding information about Web resources, but information about and relations between things in the real world: people, places, concepts, etc.
| |||||||||||||||||||||||||||