Where Ontologies End and Knowledge Graphs Begin Where Ontologies End and Knowledge Graphs Begin
Ontologies have been present in artificial intelligence research for at least forty years, coming into their own in the ‘80s on... Where Ontologies End and Knowledge Graphs Begin

Ontologies have been present in artificial intelligence research for at least forty years, coming into their own in the ‘80s on the back of a research wave that catapulted them into popularity by the mid-‘90s. However, interest in ontologies waned by the 2000s as machine learning became the hot new technology for search engines and advertising. But in the past decade, two words have pushed ontologies and semantic data back into the spotlight: knowledge graphs.

Knowledge graphs have been embraced by numerous tech giants, most notably Google, which is responsible for popularizing the term. But that new widespread attention from the research community has helped foment a significant debate among knowledge representation experts: what even is a knowledge graph?

In truth, no one is really sure – or at least there isn’t a consensus.

What are the components of knowledge graphs?

The knowledge representation experts who specialize in semantics-driven ontologies will make no bones about it: a knowledge graph is necessarily built on semantics. Semantics, they argue, is the basis for creating new inferences from the data which would otherwise go unseen. It’s the difference between something that generates new knowledge and a database laying dormant, waiting to be queried. Anything less is just a labeled graph.

There’s something to that philosophy. A knowledge graph isn’t like any other database; it is supposed to provide new insights, which can be used to infer new things about the world. If it’s just a bunch of labeled arrows, then that doesn’t comport with the concept of a knowledge graph as an artificial intelligence technique. At that point, it’s just a fancy database.

With that said, Google has largely foregone semantics in building the Knowledge Graph – the piece of technology that popularized the term in the first place. In its early days, the Knowledge Graph was partially based off of Freebase, a famous general-purpose knowledge base that Google acquired in 2010. Today, the Knowledge Graph still uses schema.org, a collaborative effort between multiple tech giants to develop a schema for tagging content online. However, schema.org’s use of inferential semantics is very limited. Many experts would agree that the Knowledge Graph isn’t semantic in any meaningful way.

Besides semantics, there’s a whole other, more fundamental battleground on which the debate is being waged: size. Many would agree that sheer scale is part of what sets an ontology apart from a knowledge graph. Ontologies are generally regarded as smaller collections of assertions that are hand-curated, usually for solving a domain-specific problem. By comparison, knowledge graphs can include literally billions of assertions, just as often domain-specific as they are cross-domain.

While that kind of breakdown is appealing, there’s no denying that it is a fundamentally arbitrary concept and becoming less useful by the day. The definition of ‘small’ on the Web has been exploded by an onslaught of data, both machine- and user-generated. ‘Small’ can mean anywhere from 100 to 100,000 rows of data – or, in our case, assertions – depending on who is asked. That discrepancy is perfectly captured by the Gene Ontology, which represented more than 24,500 terms as of 2008. That was ten years ago; GO has grown so much that Springer has released a 300-page handbook specifically dedicated to learning how to use it. If size is the deciding factor, then the Gene Ontology should almost certainly be known as the Gene Knowledge Graph.

Where exactly do ontologies end and knowledge graphs begin?

Even framing the question along one dimension like this will generate pushback among knowledge engineering experts. Many would argue that the divide between ontology and knowledge graph has nothing to do with size or semantics, but rather the very nature of the data. For example, dividing all class structures and relationship definitions into one group and all instance-level data into another might fulfill their idea of an ontology and knowledge graph, respectively – one to be used for inference, and the other to be queried for examples.

It’s unlikely that a consensus will emerge anytime soon on what a knowledge graph is or how it is different from an ontology. For now, it’s more helpful to remember that the two approaches to are fundamentally the same. Most caveats stem from disagreements about size, the role of semantics and the separation of classes from instance data. But when it boils right down to it, they are generally larger or smaller versions of each other, with more or less sophisticated knowledge encoding techniques under the hood.

Spencer Norris, ODSC

Spencer Norris is a data scientist and freelance journalist. He currently works as a contractor and publishes on his blog on Medium: https://medium.com/@spencernorris