
Metadata in Action: Tips and Tricks from the Field
Over the past decade, metadata systems have been criticized for their lengthy setup processes. This has led to impatience and a lack of funding, resulting in projects gathering dust. While it is known that metadata is extremely important for tracking business outcomes, many organizations struggle to utilize it to their benefit.
Last month, I led a panel discussion at the 2024 Metadata & AI Summit alongside metadata experts from Grab, Slack, and Checkout.com to explore the benefits of utilizing metadata to the fullest extent and provide some tips for overcoming common challenges.
In this recap, we will explain why metadata is important, share a few tips and tricks from the field experts, and review personal experiences from Slack, Checkout.com, and Grab.

Why build a metadata strategy?
As companies grow and their data landscape rapidly expands, it’s critical to set a data strategy early and to regularly reassess its efficacy to tackle issues related to data discovery, data governance, and data observability before they become insurmountable problems.
Harvey Li, Engineering Manager II at Grab, shared that the Grab Team decided to start early on with their metadata strategy. Grab introduced an enterprise data catalog years ago to reap the benefits of data discovery, such as breaking data silos and making data easily discoverable for anyone that needs data.
As the years went on, Grab’s data ecosystem became more complex as more data-centric use cases surfaced. In reevaluating their metadata strategy, the Grab Team realized that enterprise catalog solutions didn’t provide enough flexibility to address their evolving pain points beyond data discovery. With this in mind, they evaluated several open-source solutions in 2021 and decided on DataHub.
DataHub’s architecture ended up being suited to Grab’s needs, and the collaboration began, starting with tidying up its data lineage — one of the first metadata challenges an organization can face, alongside data ownership.
Over time, Harvey explained that Grab expanded its scope of metadata to capture and categorize three types: technical metadata behavior, behavioral metadata, and business metadata. Utilizing generative AI, the company was able to elevate its approach to capturing business metadata, such as using Gen AI to help generate data asset documentation.
There were several similarities between Grab’s need for a metadata strategy and Slack’s requirements. Nedra Albrecht, Senior Data Engineer at Slack, shared that the company initially started their metadata journey when attempting to generate a robust lineage graph between Hive and Airflow. While this was useful during data discovery, it didn’t yield a material impact on company operations.
Since then, Slack has doubled down on its focus on producing and maintaining robust metadata across its data ecosystem, moving well beyond mapping lineage within the data warehouse. Generating high-quality metadata across their data ecosystem is becoming the mainstay, from when an event is defined in code to how that data is transformed within the warehouse and how that data pops up as a metric.
This yields powerful results, both in providing visibility into the full lifecycle of metric generation and definition and also in driving organizational-level metrics to measure teams’ performance, resource distribution, and cost allocation.
Matthew Couder, Data Engineer II for Checkout.com, shared that they faced a significant challenge that led to the adoption of a new metadata strategy. Although the company itself had a wealth of data, including valuable insights for product development and auditor reporting, the employees who understood this data best lacked the necessary tools to share it effectively.
Implementing a metadata strategy empowered the data owners with the tools they needed, leading to improved efficiency and fewer incidents.
Benefits of early-stage metadata strategy implementation
While creating a metadata strategy can take some time, there are plenty of quick wins that can provide immediate value while building cross-functional buy-in and enthusiasm.
After adopting DataHub, the Slack team quickly gained a full picture of cross-platform lineage. Nedra explained that this was foundational in helping her and her team evangelize the power of metadata by providing stakeholders with a clear visual of the interconnectedness of their data assets and gaining buy-in for broader investment in setting metadata standards.
Similarly, the Grab Team had quick wins by surfacing lineage into DataHub, surfacing a spaghetti-like mess of interdependencies that had evolved as the team managed tens of thousands of data sets in its data lake. Additionally, Harvey shared how they drastically improved the coverage and accuracy of data ownership—for example, if someone left the organization, they could automatically identify the next best owner.
Checkout.com’s first major win was its ability to pair lineage with accurate ownership documentation. This made dealing with late-night incidents much easier since they knew who managed the source data. Additionally, Michael shared that they were able to document over 1,000 sources in a matter of months!
Data management growing pains
With any new strategy comes early-on growing pains, and the important thing is to identify and address those pains as quickly as possible.
After Checkout.com addressed some initial issues, the next step was to clarify responsibilities and determine future actions. Couder noted that the organization lacked clear guidelines and tools for critical processes, such as depreciating assets, granting access, publishing data, and evolving schemas.
While establishing these practices can take time, Michael pointed out that partnering with a team and/or set of stakeholders who are particularly invested in the outcomes helps things move much more quickly. Once you have established easy-to-follow protocols with your immediate users, you can transition to systematically enforcing the standards instead of having them opt in.
Advice for your early metadata days
We know that developing and rolling out a metadata strategy can be daunting, but here’s some parting advice from our panelists.
Harvey Li recommends viewing metadata management as an evergreen problem, not an overwhelming challenge. Start small and take time to celebrate the small wins.
Start small and take time to celebrate the small wins.
Nedra Albrecht emphasizes the importance of capturing metadata at its source, also known as “shifting left”, to expedite your ability to surface valuable information for a variety of stakeholders.
Michael Couder agrees with the importance of shifting left, calling out that it streamlined their ability to prevent breaking changes within their rapidly evolving data ecosystem.
Keep in touch
Whether you’re at the beginning of your metadata journey or in the midst of it, rest assured that you’re not alone! Join 12k+ data practitioners in DataHub Slack if you’re looking for more tips and tricks in metadata strategies.
You can watch the full panel discussion on demand now, and check out my recap from the 2024 Metadata & AI Summit here.