Skip to content
DataHub
Get a Demo
Product Overview

Product Overview

AI-powered discovery, governance, and observability unify across your data estate to deliver data quality, compliance, and AI readiness.

Learn more

Platform

  • Discovery
  • Observability
  • Governance
  • Lineage
  • AI
  • Context Management New

Explore

  • The ROI of DataHub Cloud
  • DataHub Cloud vs Core
  • Integrations
  • Product Demos
Join the Community

Join the Community

Get help, share ideas, and connect with the DataHub community on Slack.

Learn more

Engage

  • Join the Community
  • Town Halls
  • Docs
  • Champions

Connect

  • Slack
  • Youtube
  • Office Hours
Pinterest Powers its #1 AI Agent with DataHub Context

Pinterest Powers its #1 AI Agent with DataHub Context

Modern data discovery goes beyond search. Learn how DataHub connects your data estate end-to-end.

Learn more
Resources
  • Blog
  • Guides
  • Events
  • Customer Stories
  • Webinars

Support

  • Docs
  • Get Support
  • Live Group Demo
Context Management for Enterprise AI

Context Management for Enterprise AI

The complete resource hub for context management: foundational concepts, architecture guides, implementation patterns, and comparisons.

Learn More

Hubs

  • Context Management
  • Data Lineage Coming Soon
Careers

Careers

Data is powering AI. But without context, even the best models fall short. Join us.

Learn more

Company

  • About us
  • Careers
  • News
Partners
  • AWS
  • Google Cloud
  • Snowflake
  • Databricks
DataHub
  • Platform

    • Discovery
    • Observability
    • Governance
    • Lineage
    • AI
    • Context Management New

    Explore

    • The ROI of DataHub Cloud
    • DataHub Cloud vs Core
    • Integrations
    • Product Demos
    Product Overview

    Product Overview

    AI-powered discovery, governance, and observability unify across your data estate to deliver data quality, compliance, and AI readiness.

    Learn more
  • Engage

    • Join the Community
    • Town Halls
    • Docs
    • Champions

    Connect

    • Slack
    • Youtube
    • Office Hours
    Join the Community

    Join the Community

    Get help, share ideas, and connect with the DataHub community on Slack.

    Learn more
  • Resources
    • Blog
    • Guides
    • Events
    • Customer Stories
    • Webinars

    Support

    • Docs
    • Get Support
    • Live Group Demo
    Pinterest Powers its #1 AI Agent with DataHub Context

    Pinterest Powers its #1 AI Agent with DataHub Context

    Modern data discovery goes beyond search. Learn how DataHub connects your data estate end-to-end.

    Learn more
  • Hubs

    • Context Management
    • Data Lineage Coming Soon
    Context Management for Enterprise AI

    Context Management for Enterprise AI

    The complete resource hub for context management: foundational concepts, architecture guides, implementation patterns, and comparisons.

    Learn More
  • Company

    • About us
    • Careers
    • News
    Partners
    • AWS
    • Google Cloud
    • Snowflake
    • Databricks
    Careers

    Careers

    Data is powering AI. But without context, even the best models fall short. Join us.

    Learn more
Get a Demo

Metadata Day Round-Up: 5 ways to empower data producers (Warning: Data Contracts ahead!)

By: Shirshanka Das

11.28.22

Contents
    Photo by Nick Fewings on Unsplash

    Photo by Nick Fewings on Unsplash

    How time flies! In the previous edition of the Metadata Day Round-Up series, I briefly touched upon the implementation side of data governance with a smattering of insights and ideas from the Metadata Day expert panel that was held in the summer of 2022. Today, let’s get down to brass tacks to answer this important question: How do we help data producers in making data governance happen?

    The data community has been really enamored with the ideas of data mesh as proposed by

    Zhamak Dehghani and more recently, data contracts as proposed by Chad Sanderson and the principles behind these ideas were part of what we discussed as well. Here are 5 practical tips for giving your data producers the superpowers they need to make modern data governance a reality for your organization.

    1. Shift Left, but prepare right

    The essence of Shift Left is simple: Moving the source of truth for metadata to live as close as possible to the source of the data definitions themselves.

    We need to go beyond schema at a physical level (columns and types) to a schema-plus-semantics approach to know what the data means, not just what its structure is — and the best place to add that information is on the “left”, where the data is produced.

    This idea is something Josh and I have worked very closely on, and are deeply passionate about. Check out our detailed post: Shifting left on governance: DataHub and schema annotations.

    While on the topic, something Nishant said, really resonated. Shift left, but prepare right.

    This means that while you go far left to identify the data, ownership, etc., design your downstream systems and tooling to give the data producer some value for all their work upstream.

    As an example, this could mean providing tools that provide certain guarantees or benefits when metadata is provided by data producers — say by handling extraction, preservation, encryption, anonymization, dissemination, etc. This way, engineers will see that their accountability/involvement ends with correctly identifying data and its risk — so they have more time to do the stuff they’re supposed to do.

    2. Separate semantics from policy

    So what should these data contracts contain? What information can we rely on data producers to be able to provide reliably? Schemas and “schema-attached” business metadata like classification tags are not enough, you should also be able to refer to quality related metadata like SLA-s, data distributions, data management related metadata like retention and entitlements, as well as higher level business context in your contract specification language.

    While determining “what” the specification should contain, the panel felt very strongly that we need to separate the semantics of the annotations from the policies that they activate. This means that we need to focus on

    • Standardized vocabularies
    • Mappings
    • Ownership, etc, in the schema

    If you can cleanly define the semantics of data and separate them from policies; that will allow engineers to annotate their data with policy-relevant information — without being experts in the policy. This way, engineers need to know enough — in terms of the vocabulary and their meaning — so they can annotate at the right place and at the right time.

    A very simple example of this would be: ask the producer to label if a particular field contains user-provided email addresses, but spare them from making the determination of whether this constitutes highly-confidential information or personally identifiable information. Let the centrally defined mapping layer handle that.

    An important caveat that Josh called out here was that “as important as it is to make data governance practices easier for engineers, you also need a specialized team that’s thinking deeply about relating data policies to regulations — whether it is ontologists, schema design experts, etc.”

    3. Automate, Automate, Automate

    While providing data producers the tools to provide metadata at source is great for sustainable governance, there is often a bootstrapping problem. How do you get started on an initiative that requires you to change business processes? What should the contract even look like?

    Automation is essential here; not only in auto-creating / suggesting the first version of the “data contract specification” for the data producer; but also in continuously validating that the reality of the data in production actually is aligned with what the producer has declared. In other words, contract monitoring is essential both in creating the first specification and also in continuing to assert that the contract is a valid specification.

    This interaction between automation and humans will typically involve a “process automation” tool like JIRA to bridge the two worlds. Amol made a very strong case for process automation, as a powerful but as-yet under explored aspect of automation in data governance.

    Use Process Automation Tools to bridge the gap

    I have personal experience with this approach at LinkedIn, where we extracted metadata from DataHub, processed it in bulk to determine which data assets needed human intervention to get back to a healthy state (e.g. ownership is too low, classification tags are missing, retention specification doesn’t seem to correlate with retention observed, etc.). Assets failing these tests resulted in JIRA tickets being created to drive the remediation process. Finally, a global monitoring application monitored the progress of these JIRA tickets and raised alerts along the management chain if the issues continued to persist beyond their SLA.

    We’re applying a similar approach, except not just batch, but in real-time, through the “Governance Tests” feature as part of the Acryl DataHub offering, and I’m really energized to see the large diversity of use-cases where our customers are applying this feature.

    Automated Governance Tests: Acryl DataHub

    4. Getting funding for automation: Link governance automation to business improvements

    Even if you are convinced that automation is the way to unlock scalable governance at your organization, you will still need to convince the people that control the purse strings (a.k.a budget holders) that this is an area worth investing in. For far too long, the ask has been, as Nishant says, to ‘make room for data governance’ — but using automation at scale and getting governance done right, have tangible business benefits. We discussed a few anecdotes that can serve as strong business cases to be made for data governance automation.

    Nishant shares an example of data governance leading to business improvements

    In this section of the metadata panel above, Nishant shared his experience at Uber where the Privacy team added an element of automation to infer schemas within unstructured data sections within their event streams (e.g. being able to infer email address holder fields within json blobs). Engineers could decide whether they wanted to continue accessing these blobs as unstructured data or utilize the inferred schema available to them. In the course of time, besides helping correlate the data to future downstream policies using the AI model, this approach also provided business benefits like

    • Lower data encryption and storage costs since more data was deleted
    • Fewer instances of queries timing out due to large data sizes

    So it ended up being a win-win situation, automation made data management easier and more accurate and gave time back to the engineers to be more productive with their time, resulting in a net business benefit!

    5. An Unconventional Idea: Make business users part of data production

    Traditionally, business users have always been looked at as consumers of data. However, one of the tenets of the data mesh proposal is that the business takes bigger ownership of data products.

    So what role can business users play in the creation of a data product?

    Teresa has had some great success in involving business users early on in the creation of data products. Watch the section of the video below to see what she had to share about this idea and domains where that has worked well.

    Involving Business Users in creation of data products

    In a nutshell, since business users are best placed to describe the value of the data and create the definitions that describe the data, they can provide the definitions for quality as it relates to those topics.

    So interestingly, production of data itself should include both the “technical” persona and the “business” persona to create a complete specification or requirements spec for the data product.

    Here’s what this could look like in action:

    • Understand each cohort’s needs — who needs the data and why?
    • Catalog these identified use cases and business glossary terms into a business or logical layer.
    • Connect the business terms, the people, organizations, and the use cases to the technical layer that holds the physical data asset(s) which serve the business use-case.

    If you’ve gotten this far, you now know about 5 things that you could be doing to enable your data producers to be an integral part of your data governance strategy.

    In the next few posts in the series, we’ll explore decentralization, knowledge graphs, and the specifics of looking at governance as code.

    Stay tuned!

    Curious to see DataHub in action?

    DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

    Meet with us

    See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data.

    Book a Demo DataHub Cloud

    Join our open source community

    Explore the project, contribute ideas, and connect with thousands of practitioners.

    Join the Slack community slack

    Recommended next reads

    View All Blogs
    Netflix Reimagines Discovery and Governance at Scale
    CUSTOMER STORY03.20.26

    Netflix Reimagines Discovery and Governance at Scale

    With DataHub, Netflix empowers teams to define and manage metadata through self-serve workflows, improving flexibility and governance.

    Introducing DataHub Cloud v0.3.17
    PRODUCT UPDATES03.24.26

    Introducing DataHub Cloud v0.3.17

    DataHub Cloud v0.3.17 brings native Microsoft Fabric connectors for cross-platform lineage, Ask DataHub Plugins for multi-tool context, and smarter data quality monitoring.

    The State of Context Management in 2026
    CONTEXT MANAGEMENT03.09.26

    The State of Context Management in 2026

    Survey data from 250 IT and data leaders exposes the gap between AI confidence and the context management infrastructure production-scale agentic AI demands.

    Product

    • Product Overview
    • Discovery
    • Observability
    • Governance
    • Lineage
    • AI Data Management
    • Context Management
    • The ROI of DataHub Cloud
    • Product Demos

    Community

    • Join the Community
    • Docs
    • Champions
    • Town Halls
    • Office Hours
    • Slack
    • Youtube

    Resources

    • Customer Stories
    • Blog
    • Guides
    • Articles
    • Webinars
    • Get Support

    Company

    • About Us
    • Leadership
    • News
    • Careers

    © 2026 Acryl Data, Inc.

    Privacy Policy Terms of Service Security