Data Products: From Concept to Implementation
The argument for treating data as a product has already been fought and won: The industry agrees. Analysts have written the frameworks, conference talks have made the case, and most data leaders will tell you they’re bought in. And yet, most organizations still can’t point to a functioning data product in their stack.
Basically: The concept landed. The execution didn’t.
What happened is something we’ve watched play out across the DataHub community for years now: Data products got caught in the crossfire between data mesh theology, competing vendor definitions, and a gap between the people who design data products on whiteboards and the people who have to make them work in production. The result is a concept that everyone endorses, and almost nobody operationalizes.
What’s been missing isn’t another framework. It’s a model that connects to how people already work.
What do we mean when we say “data product”?
Let’s get the fundamentals out of the way so we know we’re sitting at the same table:
Definition: Data product
A data product applies product thinking to data assets. It has ownership. It’s discoverable. It’s documented. It’s governed. It has clear boundaries and serves a defined set of consumers. These are broadly accepted characteristics, and most modern definitions, from Gartner‘s to Zhamak Dehghani‘s original formulation within the data mesh framework, converge on these fundamentals.
Let’s get the fundamentals out of the way so we know we’re sitting at the same table:
Where it gets interesting is what happens next. Because the real question was never what a data product is. It’s what one looks like inside your actual infrastructure. That’s where most organizations get stuck.
At DataHub, we’ve arrived at a deliberately simple framing: A data product is a boundary drawn around existing assets (tables, pipelines, dashboards, topics, views) that makes their relationship, ownership, and purpose explicit. The assets already exist in your stack. The data product doesn’t create new infrastructure. It makes what’s already there legible and governable.
This distinction matters more than it might seem. It’s the difference between asking teams to architect something new from scratch and asking them to formalize what they’ve already built. One of those is a multi-quarter initiative. The other is something a team can do this sprint.
What data product initiatives repeatedly do wrong
We’ve spent years building alongside the DataHub open-source community—thousands of practitioners across industries working with real data stacks, real organizational constraints, and real pressure to show results. Across those conversations, the same question keeps surfacing: Why do data product initiatives stall between strategy and implementation?
The patterns of failure we see aren’t theoretical. They come up in community conversations, in Slack threads, and in the design feedback that shaped how we built data products in DataHub.
Three failure modes show up over and over:
1. Data products stay conceptual
Teams define data products in architecture decks and design documents, but they never connect to real assets in a real catalog. The definition exists in one world; the infrastructure exists in another. Without a connection between the two, data products remain aspirational—something the organization says it has, not something anyone can actually discover, consume, or depend on.
2. Governance gets bolted on after the fact
A team builds a data product: Defines the scope, assigns ownership on paper, ships some documentation. Six months later, when downstream teams start depending on the output, questions about quality, compliance, and stewardship arise. But the scaffolding isn’t there to answer them. Governance was treated as a follow-on initiative rather than something embedded in the data product from day one.
3. The business-technical divide never closes
Business stakeholders define what data products should represent. Engineers build the underlying infrastructure. But neither side participates meaningfully in the other’s process. Over time, the business definition drifts from the technical implementation. The data product that exists in the catalog doesn’t match what the business thinks it governs, and nobody has a single source of truth to reconcile the gap.
If any of these sound familiar, you’re not alone. They came up repeatedly when we designed the data product entity in DataHub, and they directly shaped what we built.
The DataHub approach: Drawing boundaries, not building new infrastructure
The failure modes above share a common root: They all treat data products as something you build on top of your existing stack rather than something you define within it. That’s the gap DataHub’s approach is designed to close.
Start with what already exists
Rather than asking teams to architect a new entity from scratch, DataHub’s model lets you draw a boundary around assets that already live in your stack. A revenue data product might include the pipelines your team runs, the tables they produce, and the dashboard that gets exported. The data product makes that grouping, and its ownership, documentation, and governance, explicit
Think of it as the difference between building a new house and drawing a property line around one that’s already standing. The structure is there. The data product gives it an address, an owner, and a set of rules.
Shaped by practitioners, not a product roadmap
When we set out to model data products in DataHub, we didn’t start with a spec. We started with a community design process: An open channel where practitioners shared how they thought about data products, what they needed, and how they expected to interact with them. People brought use cases, visualizations, and academic references. The input was diverse and sometimes contradictory, which was exactly the point.
What emerged was a model grounded in how people actually work, not how a vendor thinks they should. That matters because the practitioners who use DataHub are the same people who’ve been burned by over-abstracted frameworks that look clean on a slide and fall apart in production.
Managed as code, accessible to everyone
A defining feature of DataHub’s approach is the YAML-based spec that lets teams define and manage data products as code. Developers can define data products in YAML, check them into Git, and sync definitions with DataHub. Business users can collaborate on and refine those definitions without needing to live inside a developer toolchain.
This isn’t shift-left as buzzword. It’s a practical mechanism for closing the gap between the people who define data products and the people who build them, ensuring the definition and the implementation stay in sync rather than drifting apart in separate workflows.
From boundary to governance
Here’s what happens once you’ve drawn a boundary around a set of assets, assigned ownership, and documented what a data product contains: You’ve built the scaffolding for governance without launching a separate governance initiative.
Data products deliver a unit at which you can actually govern. Quality standards, compliance requirements, access controls, and stewardship attach to the data product rather than to individual tables scattered across your warehouse. When a downstream team depends on your output, the governance framework is already in place because it was embedded in the data product definition from the start.
This is where data products stop being an organizational nicety and start being infrastructure. They’re the enabling layer for everything that needs to happen at enterprise scale—data quality, access control, compliance, discoverability. The governance doesn’t come later. It’s there from the moment the boundary is drawn.
As data products scale across an organization and enterprise data flows between teams that depend on each other’s outputs, the questions that arise (Who owns this? What are the quality expectations? Who’s accountable when something breaks?) already have answers. That’s not a small thing. In most organizations, those questions are what stall data product adoption in the first place.
Miro: What this looks like in practice
Miro’s data engineering team faced their own version of the problems described above and their experience shines a light on what changes when data products are implemented rather than just discussed.
Miro had adopted Airflow as their central metadata hub for SLA validation, but the approach created significant friction. Data contracts lived in engineering-owned repositories and referenced internal task names that analytics users couldn’t interpret. Airflow alerts focused on pipeline statuses without providing business context. And because Airflow couldn’t see into downstream tools like Looker, the team had incomplete visibility into data product health.
The gap between technical infrastructure and business understanding was wide, and it was growing.
When Miro implemented DataHub Cloud as their metadata management platform, the structural shift was concrete: They moved data product and contract definitions into their dbt repository, which meant analysts already familiar with the repo could contribute directly to product creation and quality standards by authoring YAML files aligned with the DataHub definition. Data products became fully discoverable in the UI, complete with contract details and readable SLAs.
“These initiatives not only build trust in our data but also empower stakeholders to make data-driven decisions with confidence, driving long-term business success in the dynamic data landscape.”
– Ronald Angel, Data Products Manager, Miro
But what makes Miro’s story instructive isn’t just the outcome, it’s the mechanism. They didn’t build something new from scratch. They drew boundaries around existing assets, made ownership and quality expectations explicit, and gave both technical and business users a shared surface to collaborate on. That’s the model working as intended.
Data products aren’t the destination
Data products are not the end state. They’re what make the end state possible.
When data is bounded, owned, documented, and governed at the product level, everything downstream moves faster, whether that’s self-service analytics, cross-team collaboration, or regulatory compliance. This is especially true for AI.
Organizations investing in AI initiatives are quickly discovering that models are only as reliable as the data they consume. Without data products (and without clear ownership, documented lineage, and embedded quality standards), AI systems are building on ungoverned, undocumented foundations. Data products help make data more AI-ready, not through a separate “AI readiness” initiative, but as a natural consequence of managing data the way it should have been managed all along.
We built DataHub’s data product model because practitioners told us they needed it and because we believe the gap between “data as a product” as a philosophy and data products as a functioning part of your stack shouldn’t take years to close. The tools exist. The model is proven. What’s left is implementation.
Get started with Data Products in DataHub Cloud →
Explore the Data Products documentation →
Join the DataHub Community on Slack →
Future-proof your data catalog
DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

Explore DataHub Cloud
Take a self-guided product tour to see DataHub Cloud in action.
Join the DataHub open source community
Join our 14,000+ community members to collaborate with the data practitioners who are shaping the future of data and AI.
FAQs
Recommended Next Reads



