Creating a Business Glossary and Putting it to use in DataHub
In a previous post, we covered the high-level differences between Tags and Glossary Terms, two powerful labeling methods in DataHub.
TL;DR, Tags are used as a way of quickly labeling an asset (table, column), to enable quick search and discovery. Glossary Terms, though also used to label, should come from a broader, company-adopted business glossary to ensure an organization is using the same vocabulary. In this post we will deep-dive into business glossaries, and how to create and properly maintain one in the DataHub ecosystem.
Business Glossaries
To become well-oiled machines, organizations must begin creating their own languages. No, I do not mean a fantasy language like ones created in Lord of The Rings or Harry Potter, but a business glossary! Business glossaries are a key pillar of Data Governance. When adopted, organizations minimize the probability of using the same term for two different purposes, which causes problems. For example, “win-rate” could mean one thing to Finance and a different thing to Sales, causing metric misalignment in an executive presentation. (No, this hasn’t happened to me, I swear).

Too real.
Business glossaries can house many different types of data, including:
- KPIs (Key Performance Indicators)
- OKRs (Objectives & Key Results)
- Classifications (Data sensitivity levels)
- Relevant company-specific jargon or acronyms
Compiling the above into a business glossary and ingesting it into DataHub provides multiple benefits. First, you now have a single source of truth for all language used in daily business activities. Second, your glossary terms now live in the same location as your company’s data assets, allowing you to associate terms with datasets, classify datasets based on these terms, and more. Let’s walk through how to do this:
Step 1: Create Initial set of Glossary Terms and establish owners for each term
While it may seem easy to begin the above using a spreadsheet or document, we highly recommend treating your business glossary similar to a checked-in artifact, so that you are able to easily track historical changes. Here is a simple YAML file that could be used as a starting point to begin creating your business glossary.

Example of a YAML file ingested into DataHub
The above file shows how you can easily assign owners to each term you create, which is key for end-users to know who to reach out to if they have questions about the meaning of a specific term, proposals for improvements to the glossary, or anything else.
Recognizing it can be tough to create a Glossary from scratch, consider gathering inspiration from industry standards like FIBO, Microsoft Data Models, ISO 27001, and ISO 11179.
Step 2: Ingest Business Glossary Into DataHub
Now that you have created the beginnings of your business glossary (don’t worry, you can update it either in your original file or in our UI later), it’s now time to ingest it into DataHub. If you have not yet installed the necessary plugins, please refer to the DataHub Quick Start Guide.
Once you have completed the Quick Start Guide, you are now ready to create a simple recipe file (example here) and run the following command to ingest your glossary into DataHub:
datahub ingest -c <your_recipe_name.yml>
Once finished, navigate to your DataHub instance to confirm all glossary terms have been uploaded. For easy navigation to the terms, click the “Glossary Terms” card shown below.

Step 3: Associate Glossary Terms with Data Assets
Now that you have uploaded your business glossary into DataHub, pat yourself on the back! You are one step closer to improving your organization’s data governance, which will pay dividends down the line. However, the job is not done yet. To maximize the efficacy of your uploaded glossary, we recommend associating the terms that comprise it to data assets in DataHub.
DataHub provides a user-friendly interface to begin associating terms to data assets immediately.
Tag a data asset with a Glossary Term in the DataHub UI
For some companies, however, accomplishing this through the UI will prove cumbersome. Thankfully, we provide methods to programmatically associate glossary terms with data assets using transformers. More on that here. You can also associate terms to Datasets through Datahub’s GraphQL API.
Conclusion
Creating and implementing a business glossary can have many benefits for a company, and DataHub provides a simple platform for organizations to go about this. You now have a single source of truth for all terms that are used in daily business activities, while your Glossary terms now live in the same place as your company’s data assets. We cannot wait to see how you, the community, use these tools to your advantage.
Have a success story using DataHub for a similar use case, or other business glossary best practices? Write to me at feedback@acryl.io
Acryl Data is hiring, click here for more information.
Want to get involved? Join our slack channel here.