Thesauri, Taxonomies, and Ontologies

Or, how we can learn from these formal methods to improve the organization of our own PIM/PKM systems.

Cameron Flint
6 min readJan 29, 2023

In this article we’ll examine the difference between three general-purpose systems used to organize knowledge: thesauri, taxonomies, and ontologies. Then we’ll consider what we can learn from these generalized systems and apply to our own individual practices of note-taking, information curation, and knowledge management.

Thesauri

A thesaurus in the general sense is used to map ambiguous words (terms) to discrete meanings (concepts). Within and across languages, there are usually several words that convey the same or similar meaning. For example if I’m building a page index in the back of my diary for words like “happy,” “sad,” “excited” under a single heading, I could use any of “moods,” “emotions,” or “feelings” to refer to that group of internal states.

While linguists will split hairs on the precise meaning of words, in practice I often prefer to pick and use just one term to refer to the approximate concept. Whichever word I choose is my preferred term (say, “mood”), while the rest are alternate terms.

A thesaurus is an example of a controlled vocabulary, which is a bit of jargon you might hear used in this context as well. (Both of the systems we’ll explore next, taxonomies and ontologies, qualify as controlled vocabularies too.)

Taxonomies

A taxonomy is designed to organize content into hierarchies in order to enable easier filing and browsing. A news site might have a taxonomy of categories like “sports,” “politics,” “technology,” etc., where the top categories can have sub-categories (e.g. “sports > football”) and so on down to an arbitrary number of levels. In Computer Science jargon, a taxonomy most resembles the tree data structure.

Taxonomies are useful for visually exploring several common groups of things. For example, I organize my bookmarks into a taxonomy of topics like “Programming,” “Gaming,” and so on, whereas another part of my digital library is organized by object type (“Books,” “Music,” “Apps,” etc.).

The most common scheme in taxonomy-building is that each level deeper in the hierarchy is more specific than its parent level. These relationships between child and parent are called narrower/broader relationships, although there are others as well — for example part/whole. A simple rule of thumb to remember is that taxonomies help you “narrow down” what you’re looking for.

Ontologies

An ontology is an attempt to thoroughly describe a bunch of interrelated concepts within a bounded world (domain). For example, within the domain of car sales, I might create an ontology that describes the various relationships between Cars, Companies, Make/Models, and Years. The aim in building such an ontology is to come up with a blueprint (schema) for organizing data such that no matter which actual cars I have in the inventory, I know the various ways of relating each car to the other cars. In C.S., an ontology is represented by a graph.

Ontologies are very useful for recommendation systems and semantic search engines*. If I’m looking at the product page for a 2023 Toyota 4Runner, it’s natural that I might want to jump from that car’s listing to either the manufacturer (Toyota) or the year (2023) in order to browse for related cars. An important aspect of relationships in an ontology is that they aren’t required to be hierarchical, unlike in taxonomies — links between concepts in an ontology are more likely to be associative than hierarchical, though often ontologies use both.

I like to think of an ontology as a layer above my data that’s like a horizontal lattice. My notes themselves “reach up” and “plug in” to the ontology in order to gain advantage of the semantic network to other notes.

(*Note: technically these search scenarios don’t have to be implemented using predefined ontologies — for example at scale they are more likely to be built using Machine Learning — but the use case is helpful to illustrate the purpose of ontologies.)

One final analogy to explain ontologies is that they are like a superstructure for your data. The relationships are defined between abstract concepts like Car, Year, Manufacturer etc. versus between concrete actuals like “Toyota,” “2023,” etc. But by linking an actual (“Toyota”) to the concept (Manufacturer), you indirectly connect “Toyota” to all the other objects in your graph by means of the superstructure. I like to think of an ontology as a layer above my data that’s like a lattice. My notes themselves “reach up” and “connect” to the ontology in order to gain advantage of the semantic network.

Application

Moving now from theory to practice, let’s review the intended purpose of each organization system:

  • Thesauri are meant to map terms to concepts in a more-or-less controlled vocabulary, so that I consistently use the same terms to refer to the same concepts, and so that I can find alternate terms when desired.
  • Taxonomies are meant to organize content into broader/narrower hierarchies, so that I can quickly file new content into buckets or browse existing content by “drilling down” through the tree.
  • Ontologies are meant to describe associative relationships between objects or concepts in a particular domain, so that I can follow links between ideas in any direction, in multiple dimensions.

Armed with this summary, here are a few suggestions for applying thesauri, taxonomies, and ontologies within your personal organization system.

  • Use a taxonomy of dates to organize your journals and (at the top level) your administrative records. This can be any any level of granularity that makes the sense for you, such as Year > Month > Day, Year > Quarter > Month > Week > Day, or simply by Year.
  • Use a taxonomy of subjects and topics to organize your personal reading library, such your bookmarks, podcasts, papers and PDF’s. If you have a broad cross-section of interests, consider grouping by domain at the top level (e.g. Technology, Sports, Cooking, etc.). Use hierarchical tags (in Bear, Evernote) instead of folders if you want an item to live under multiple topics. Some software also supports the concept of “Bundles” for this purpose (like DEVONthink, KeepIt).
  • Use a taxonomy of classes or types to organize your personal resource library, such as saved books, articles, music, and web pages. Each object type can have a set of standard fields, like “Author” (for a book or article), “Artist” (for music), and so forth. Notion and Capacities are great apps for object-style organization.
  • Use a thesaurus to build yourself a personal index or metadata “menu,” comprised of the core concepts and terms that you use throughout your organization system: topics, types, events, moods, and statuses, to name a few. If your software supports it (like RemNote does), list the alternative terms as aliases under each main note so that you can avoid creating duplicates.
  • Use an ontology to build detailed relationships between ideas in your system (requires tools like Roam Research, Obsidian, Tana, etc.). It doesn’t have to be formal by any means, but consistency is key. For example, the concept “Car” is related to “Manufacturer” — how can you ensure that notes related to either of those two concepts are linked up and surface together? (Perhaps by co-locating the list of manufacturer-notes and car-notes.) Or how can you go from browsing a list of “Meeting” notes to a list of all “Event” notes that would include all of “Meetings,” “1on1’s,” “Trips,” “Conferences,” etc.? (Perhaps by making “Meeting,” “1on1”, …, “Event,” type-notes themselves and linking “Meeting” to “Event.”)

Wrap-up

If you are successfully using your own thesauri, taxonomies, and/or ontologies in your personal organization system, or if you have hard-won lessons to share on how or how not to use them, I’d love to hear from you in the comments or drop me an email at cameron_sea {at} fastmail {dot} com.

--

--

Cameron Flint
Cameron Flint

Written by Cameron Flint

Diving deep on topics related to note-taking, personal information management, and software engineering, with occasional diversions to less nerdy things.

Responses (1)