Statistical methods are inarguably the hottest approach to evaluating datasets at scale right now. They’re not without their weaknesses though – they’re ultimately heuristic, and some methods like neural networks require tremendous amounts of data to create a well-fitted model.
That’s where semantics come in. If statistical methods attempt to discover the relationships between variables, semantics allows us to codify those relationships explicitly. Statistics just identify that there’s a relationship between entities; semantics allow us to define what that relationship actually means.
The Family Tree
Assume that you have a database of parents and children: Stacy is Dan’s mom, and Dan is Kendra’s dad. There’s a very simple inference that we can make as humans: Stacy must be Kendra’s grandmother. However, there is nothing in our database that codifies that fact; it’s information we need to provide through other means.
This isn’t something that statistical techniques are well-suited to. Why would we create a model to try and discover this information anyways when it’s something that we already know and can explain clearly? Instead, we’ll create a set of relationships that defines what the nature of that relationship is.
Imagine we have some class called Person, and a relationship called isChildOf. Using the information we stated above, we can create a graph that looks like this.
Great, we now have a graph with all of our known relationships. We can even use a query language to traverse this graph and answer questions. For example, say you want to know who is someone else’s child. To get the answer, just ask for any entity that is a person and isChildOf someone else. If you want to know who Kendra’s grandmother is, just query for any Person that Kendra isChildOf, and who that Person isChildOf. Pretty neat, right?
Inferring Relationships; or, Why You’re Reading This at All
But this still doesn’t get to the core of why semantics is powerful. All we did was create a graph, and in practical terms, we haven’t done anything that can’t be done with a table. So here’s the real value add.
Let’s say we create a new relationship, isGrandchildOf. Our rule for this property is that if Person X isChildOf Person Y, and if Person Y isChildOf Person Z, then Person X isGrandchildOf Person Z.
isGrandchildOf doesn’t appear anywhere in our existing graph. But that’s the trick: it doesn’t have to be explicitly stated, because we can infer it from the properties of relationships that are explicit.
Since Kendra isChildOf Dan, who is a Person, and since Dan isChildOf Stacy, who is also a person, we infer that Kendra isGrandchildOf Stacy.
How about that? Can’t do that with tables!
Maybe we want to qualify what is and isn’t a person though. After all, not everything can be a person – people are by definition human, right?
Let’s get something out of the way: everything is a Thing. Everything is. If you don’t know what something is, we have to at least assume that it’s a Thing, because something can’t be Nothing. Maybe that’s a bit philosophical, but it’s so basic that it’s baked into the very functioning of semantics. There is, paradoxically, a class called Nothing, but it can’t be subclassed because that wouldn’t make any sense. I’ll explain the math behind that soon, but for now, every class you create must be a subclass of Thing.
So let’s create a second class, Human, which is a subclass of Thing. We create one basic requirement for a Thing to be a Human: it must have a heart. Obviously, it’s more complicated in the real world, but we’ll put the blinders on for now. We’ll also create a relationship hasHeart that takes a boolean to track that.
Now let’s also modify the Person class a bit. We’ll define it as a subclass of Human. Here’s something important to understand: classes are implemented as sets and work with set theory under the surface. So when we say that Class B is a subclass of Class A, we mean that B is a subset of A and thus inherits its properties. Therefore, when we say that a Person is a subclass of Human, it must be the case that to be a Person, you need a heart.
We’ll add one more requirement for something to be a Person: they need a personality. We’ll define that requirement with another property, hasPersonality, which must be True. Therefore, it’s possible to be a Human but not a Person, but you can’t be a Person and not a Human.
We send everyone to a physician and a psychologist and they come back with their test results. We punch our results into our graph and get this:
Since Dan and Kendra have both hearts and personalities, they’re both of type Person. Stacy, unfortunately, lacks a personality, so she’s merely Human. Everything is, at the very least, a Thing.
Dan and Stacy might have some things to hash out, but hey, the power of semantics let us figure that out.
Enough Pictures, How Do I Do This?
So, what is an ontology? An ontology is a collection of class and relationship definitions, like Human and isGrandchildOf, often accompanied by instances, such as Kendra and Stacy.
The most common way of creating an ontology is with the Web Ontology Language, or OWL. OWL is a standard of the World Wide Web Consortium, or W3C for short. OWL is encoded using the Resource Definition Framework, or RDF, another W3C standard. Remember those three acronyms and you should be fine. OWL 2 is currently a recommendation of the W3C, though it will likely be converted into a standard in due time.
RDF itself is essentially just a vocabulary for XML elements – but don’t worry, you won’t need to write XML by hand. There are some common environments available for creating ontologies with a user interface, Protégé being the most popular. Download the editor and see if you can get cracking on the examples above!