Ontology 101

Your manager hands you an excel sheet with a whole bunch of columns and hundreds of rows and asks you for an analysis by the end of the week. Let’s say a spreadsheet like this:

Sample Data – Work Orders (source)

Where would you even begin? You see dates, that should be simple enough, but what is ‘ReqDate’ and what is ‘WorkDate’? What is the difference? You then look at the columns whose labels might provide a clue, ‘District’, that must mean a location. North? North of where? The ‘TotalCost’ column is no help since there isn’t any indication of currency. You go back to your manager and ask where the data came from and they tell you that Bob gathered the data but he just retired last week. Now what?

This is a common problem that every organization has to deal with, especially when it comes to handing data from one person to another. What if there was a way for someone or some machine to understand the data and be able generate insights from this? What if there was a way for work orders to be categorized based on the type of work involved and labor required so that wait times would be dramatically reduced?

With the demand increasing exponentially for data analytics, more companies are turning to using ontologies and the Semantic Web in order to keep up with their competitors.

WHAT IS AN ONTOLOGY?

Origins of Ontologies

If you were to look up the word ‘ontology’ in almost any dictionary, you’ll probably get some definition about philosophy, similar to the definition shown below (source).

Now you may be wondering what philosophy has to do with computers or data. It turns out, philosophy plays a huge role in understanding the meaning of our data. Simply put, an ontology is a means by which we can describe or model something that is both human and machine readable, and often philosophical logic is applied to provide the most clarity.

The digitized form of ontologies dates back to the creation of the World Wide Web by Tim Berners-Lee in 1989 (source). At that time, Berners-Lee was focused on creating Web 1.0 and 2.0 (source), which focused on static web pages and the web as a platform (being able to create content and not just read it), respectively. Web 3.0, or the Semantic Web, is Tim Berners-Lee’s vision of a possible future for the World Wide Web (source), not to be confused with Web3 (source) – Gavin Wood’s vision of a decentralized web based on blockchain.

In 1999, Tim Berners-Lee envisioned that the World Wide Web would not be connected through static, human-oriented web pages as it is now, rather the web would be organized based on linked concepts and data that can be analyzed by machines as well.

At the time, the idea of the Semantic Web generated a great amount of excitement in the computer science communities. However, it soon died down due to the complexities of transforming legacy systems into data-centric systems. It has only been in the last few years that excitement for the Semantic Web gained traction once again, due in part to the rise of AI/ML/NLP capabilities and demand, Big Data and our world becoming increasingly more complex.

WHAT IS AN ONTOLOGY

The most basic definition of an ontology can be described as a model of a concept and its relationships with other concepts, usually in the form of Classes and Instances connected by Properties. This model ideally is both machine and human readable, that is to say that the example below could be understood by both a human and a machine.

Simplified example of an ontology of the fictional character ‘Harry Potter’ and his relationships

The example above focuses on a single literary character and his relationships. What is amazing about ontologies and the Semantic Web is its ability for concepts and relationships to be reused and for the ontologies to be easily extended without having to change code that would be based on this model! You might be able to notice that the relationships between the other characters were intentionally left out. For example: Ron and Hermione are also friends with Harry and attended Hogwarts, forming the infamous trio who had adventures while learning magic at school. You will also notice that you can add arrows from Ron, Hermione, Lily and James to the Witch/Wizard bubble with the rdf:type relationship since all of these characters can be considered to be a type of witch or wizard. It can also be said that Harry Potter and the others are Instances of the Witch/Wizard Class, which is a subClass of the Class Magical Being. 

The reason for intentionally leaving out all of those other connections is that this visual method of an ontology would start to get cluttered very quickly and soon become unreadable to humans. This is why it is better for your ontologies to also be machine readable. 

Most machines cannot read the visual representation drawn above. Therefore there are a few ways in which we can represent this human-centric version of a Harry Potter Ontology.

Ontologies were originally written in the computer language RDF/XML (source) using OWL (source). RDF stands for Resource Description Framework that uses XML (eXtensible Markup Language) syntax and OWL stands for (Web Ontology Language – yes, you read that right. OWL just made for a better acronym). If that still didn’t make things any clearer you can think of using OWL to say what the concept is about and RDF/XML is how you write it so that the computer understands. In 2014, the W3C published a serialized form of RDF/XML that organizes concepts in terms of triples (subject-predicate-object, or Harry [subject] isFriendsWith [predicate] Ron [object]) called Turtle (source).

Here is an example of what the Harry Potter Ontology (hpo) would look like in Turtle syntax:

@prefix hpo: <https://theontologywholived.com/hpo#>.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

@prefix owl: <http://www.w3.org/2002/07/owl#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

hpo:MagicalBeing

a owl:Class ;

rdfs:label “Magical Being” ^^xds:string ;

skos:definition “A being, or living entity, capable of using magic.” ^^xds:string .

hpo:Witch/Wizard

a owl:Class ;

rdfs:subClassOf hbo:MagicalBeing ;

rdfs:label “Witch or Wizard” ^^xds:string ;

skos:definition “A magical being that is either a human, half-human, or werewolf.” ^^xds:string .

hpo:harryPotter

a hpo:Witch/Wizard ;

foaf:firstName “Harry” ^^xds:string ;

foaf:lastName “Potter” ^^xds:string ;

hpo:alsoKnownAs “The Boy Who Lived” ^^xds:string ;

hpo:alsoKnownAs “The Chosen One” ^^xds:string ;

hpo:isFriendsWith hpo:ronWeasley ;

hpo:isFriendsWith hpo:hermioneGranger ;

hpo:attended hpo:hogwarts ;

hpo:hasMother hpo:lilyPotter ;

hpo:hasFather hpo:jamesPotter .

*Note: ‘a’ is often used as a shortcut for rdf:type.

If you squint, you can start to see those connected bubbles, or nodes, from the image start to form triples in the Turtle snippet above. 

i.e: hpo:harryPotter → hpo:isFriendsWith → hpo:ronWeasley

(subject) (predicate) (object)

Without getting too bogged down with the syntax (how it’s written), another important detail lies in the list of prefixes. The first one is the namespace (i.e. ‘hpo’) for The Harry Potter Ontology, with a made up IRI (source) (Internationalized Resource Identifier – a means to identify a thing that is meant to be reused). There are also a few others in there, but unlike ‘hpo’ these are real, publicly available ontologies! This means that I didn’t have to make up a predicate to define what a ‘first name’ means, I could just reuse a predicate that already exists here (this HTML is rendered from a file similar to the turtle file so both machine and human readable!). That’s the beauty of the Semantic Web, with ontologies that adhere to FAIR (Findable, Accessible, Interoperable, Reuseable) data principles, Tim Berners-Lee’s vision for the Semantic Web is another step closer to becoming reality.

Two last points to consider briefly: clarifying concepts and inferencing.

Human languages are notorious for being vague and imprecise. Some concepts can have different labels to mean the same thing, or even the same label to mean different concepts. Beyond labeling there are rules, restrictions and axioms to help clarify your concepts.

With these logical linkages between concepts, a machine is able to infer new relationships from the existing ontology through the use of inferencing. For example: let’s say you’re a veterinary clinic that has an ontology about different cat breeds. Each breed of cat could be considered an owl:Class and you can associate many defining characteristics to go along with those breeds. So if you have a client bring their cat, Eddie, to the clinic and you characterize him as a type of (rdf:type) Nebelung. Your ontology has the class (owl:Class) Nebelung as a specialization of (rdfs:subClassOf) Long Haired Cats, which can be characterized as producing hairballs (:producesHairballs) frequently. So a machine can infer that since Eddie is an instance of a Nebelung cat, then the vet should suggest hairball treatment options before his owner can complain of frequent hairballs.

Not to be confused with… 

KNOWLEDGE GRAPHS

Knowledge graphs are often used synonymously with ontologies but this is not the case. What is a graph? The simplest answer is that a graph is anything you want it to be. Similarly for knowledge graphs you could see them as “bite-sized” versions of ontologies to be used in the context of certain datasets so that you can have all the benefits of an ontology, without it being too massive. The line between ontologies and knowledge graphs can be blurry, but in general ontologies are seen as the “backbone” of knowledge graphs.

DATABASE SCHEMAS

A database schema is different from an ontology primarily in the sense that it pertains specifically to how a database is constructed and the data itself, whereas an ontology focuses on the organization of the knowledge and concepts that the data is based on.

A database schema usually is one of three forms: conceptual, logical and physical. Again this is strictly describing the data and not the meaning behind the data.

TAXONOMY

A taxonomy can be best described as being mainly hierarchical in nature and mainly used for classification. Think the Linnaean taxonomy, which is the hierarchy used in biology for the classification of all living things. The ‘Tree of Life’.

An ontology can be interpreted as the ‘Tree of Life’ that also includes the food web. So you can have those hierarchical relationships like in a taxonomy, but you can also include the complex relationships between the animals, plants, fungi, etc. In short, an ontology is much more semantically rich than a taxonomy.