Announcing the DataHub Context Platform

The Foundation Analytics Agents Have Been Missing

By: Shirshanka Das, Swaroop Jagadish

05.28.26

Last year at CONTEXT 2025, our annual virtual summit, we put a stake in the ground.

We stood up in front of thousands of DataHub practitioners and said: the agentic era isn’t coming, it’s here. The enterprises that win will be the ones with infrastructure strong enough to support AI agents operating at scale. And we named the discipline that makes that possible: context management.

We also made a promise. We said DataHub would evolve to become a context platform purpose-built to give AI agents the relevant, reliable, and trustworthy context they need to act on your data with confidence.

Today we’re making good on that promise. We’re launching the Context Platform for analytics agents.

Quick definition: What is context management?

Context management is the organization-wide capability to reliably deliver the most relevant data to AI, enabling governed, enterprise-scale deployment of agents. It spans context creation, validation, and delivery.

Why context became the defining problem

We’ve been building DataHub for nearly a decade, starting as an open source project at LinkedIn. It went through many phases of evolution, to solve data discovery, lineage, observability and governance (GDPR), driving these outcomes for humans and machines at scale.

Thousands of organizations, from startups to the world’s largest enterprises, adopted it. Our open-source community built connectors, extended the model, and contributed features that pushed the product further than any single team could.

But the consumers of metadata have changed. For most of the past decade, the people using DataHub were humans. A data engineer checking lineage before a pipeline change. An analyst looking up a column definition. A governance lead auditing access. When the consumer of your metadata was a human, a poorly documented table was an inconvenience. A Slack message or a conversation with the right engineer filled the gap. In some advanced companies, machines were connected up to DataHub to drive automated outcomes, data deletion, replication, anonymization, etc; but many deployments stayed human-powered.

GenAI changed all that. The consumer of DataHub is shifting rapidly from humans to agents acting on their behalf. Exciting and a little scary at the same time. When the data consumer is an analytics agent operating at machine speed, it’s a different category of risk than a new employee asked to figure out why customers are expanding usage after six months on the product. Agents don’t have intuition. They don’t ask a colleague. They don’t slow down when something looks off.

When a sales leader asks an analytics agent “What is our revenue growth in the West Region year over year?”, the agent needs to know that revenue is defined as Contracted Annual Recurring Revenue (CARR), which tables hold CARR by Region data, which table is the most updated and can be trusted, which version of the formula the company uses, and how churn is handled. It needs all of that before it writes a single line of SQL. Without it, the agent fails silently. It answers confidently and incorrectly. The formula is deprecated. The join is wrong. The data is three weeks stale. By the time anyone notices, the business decision has already been made on it.

Everyone has their own story of an analytics agent being confidently wrong. It’s the defining failure mode of analytics agents in production. In a 2026 survey that DataHub commissioned from independent research firm TrendCandy, 88% of leaders say they have a fully operational context platform. But 66% admit their AI models frequently generate biased or misleading insights due to low maturity of data infrastructure in providing sufficient context.

This is the aspiration-reality gap. And it shows up in production the moment an analytics agent goes live.

The four infrastructure gaps behind every failed agent

Over the past year, we talked to enterprise data teams across industries about what’s blocking their analytics agents in production. What we heard was consistent, and it maps to four compounding structural problems.

Context is fragmented

Operational metadata is scattered across the data warehouse, catalog, and BI tool. Metric definitions are in dbt. Business glossaries are in Confluence or Notion. Join logic is buried in Looker dashboards. Institutional knowledge lives in Slack threads and in people’s heads. An agent trying to answer a real business question has to reconcile all of this fragmentation. Today, it can’t, because there’s no single source of truth to reconcile against.

Manual context creation doesn’t scale

Even when teams try to fix the context problem manually, the math fails them. We’ve heard from customers that documenting context through workshops takes roughly 16 hours per table. If you have 500 business-critical tables, that’s years of effort before your agent can operate reliably. Context management has a cold start problem: enterprises need rich, trusted context before agents can be useful, but manually curating that context from across the organization takes years. And data ecosystems change constantly, so whatever you documented six months ago is already partially wrong.

Expert validation doesn’t happen systematically

Even if you use AI to generate context, you still need domain experts to validate it before agents act on it. But getting a finance analyst to review 200 auto-generated metric definitions is a coordination problem that most teams haven’t solved. Unvalidated context flows into agents that use it with full confidence. And when those agents get things wrong, the fix is manual, local, and invisible to the next query that hits the same ambiguous or outdated definition.

Agents can’t access context reliably

Even if you solve the first three problems, context is only valuable if agents can actually reach it. Today, context is locked in systems that weren’t designed for machine consumption. Teams end up building fragile custom pipelines for each agent framework, scripts that break every time something upstream changes.

These four problems compound on each other. You can’t patch your way out with point solutions. You need a context platform designed to solve all four together.

The market knows this. In the 2026 State of Context Management Report, 82% of data and IT leaders agreed that agentic AI cannot reach production value without a context platform. And 91% are treating context management as an executive or C-level priority over the next one to three years.

What we built

The DataHub Cloud Context Platform addresses each structural problem directly. Here’s what we’re shipping today.

Context Ingestion

DataHub automatically ingests metadata from more than 100 sources, including Snowflake, dbt, Power BI, Confluence, and Notion, to build a unified context layer. All context is chunked, embedded, and retrievable in real time.

Context Intelligence

Instead of starting from scratch, DataHub’s Context Intelligence converts your existing enterprise query history and your expert analyst graph into a structured semantic index. When an agent receives a question, it retrieves not just schema, but validated query patterns that have answered similar questions before. This includes proven joins, filters, and aggregation logic extracted continuously from query logs, BI dashboards, dbt projects, and unstructured documents.

What used to take 16 hours per table in a documentation workshop becomes a review task that takes minutes. And because it runs continuously, context stays current as your data evolves. An update to a Notion page gets reflected in the context layer your agents are querying in real time.

Context Hub

A dedicated workspace where domain experts see AI-synthesized context, so they can easily approve, enrich, or refine it, and simulate the impact of context changes on text-to-SQL results before publishing. Evals that are in-built within the platform to ensure that you have confidence on the change you are making. Every expert interaction feeds back into the system. Instead of chasing people down in Slack, you have a structured review queue where experts confirm rather than create.

Context Activation

DataHub’s MCP server, pre-built skills, Agent Context Kit, APIs, and personalized UX make validated context available to every agent in your ecosystem: Snowflake Intelligence, Databricks Genie, Claude, Cursor, custom-built, or DataHub’s own open-source analytics agent. One governed source for every agent. Build once, activate everywhere.

And because DataHub delivers precise, pre-validated context rather than raw schema, agents require significantly fewer tokens to reach the right answer. Better accuracy at lower inference cost, even as you scale.

DataHub Context Platform demo

The results speak for themselves. Our design partner Miro‘s analytics agent, running on Snowflake metadata alone, answered roughly half of benchmark questions correctly. After layering in DataHub as the context platform, accuracy nearly doubled.

After layering in DataHub as our context platform, including data product documentation, cross-source context, and business meaning derived from our query history, we nearly doubled accuracy from close to 50% to around 90%.
Ronald AngelProduct Manager, Data Platform at Miro

Key principles guiding our development

DataHub Cloud’s context platform was designed around key principles that reflect how enterprises adopt and trust new infrastructure. Our context platform is:

Trust-oriented: Built to measurably improve agent accuracy and reliability. Every capability traces back to that outcome. Context is not a feature. It is the variable that determines whether an agent produces a result an analyst would stand behind.
Human-centric: Preserving the catalog experience data teams already rely on and adding humans-in-the-loop through Context Hub. As the role of the context engineer emerges across enterprise data organizations, DataHub Cloud is designed to be their primary tool.
Built for speed: Leveraging existing data investments rather than replacing them, and solving the cold start problem that has stalled previous semantic layer and context initiatives by turning existing query history and documentation into an immediately useful semantic index.
Open by design: The platform is built on open standards and an open-source foundation, supports multiple platforms without lock-in, and is architected so enterprises can bring their own agents and harness context across every tool in their stack.
Value-focused: The measurable outcomes DataHub Cloud targets are increased analyst and engineering productivity, lower operational costs, lower compliance risk, and faster time to production for AI initiatives.

Context doesn’t just improve accuracy. It drives adoption

We often talk about context as an accuracy problem, and it is. But the more important story is what accurate context does to adoption.

Agents that produce untrustworthy answers get abandoned. Agents that get context right become the tool people reach for without thinking about it, and that’s when the compounding value starts.

Pinterest achieved a 60x improvement in the speed of answering analytical questions with DataHub Context Intelligence. Their analytics agent saw 10x the usage of the next most-used agent at the company within the first two months of deployment. (For a deeper dive into how they did it, check out the case study.)

This is what we believe context management ultimately delivers: not just agents that answer correctly, but agents that earn the kind of trust that changes how people work.

Ready when you are

DataHub Cloud Context Platform is available now in private beta. We’ll be demoing it at Snowflake Summit and Databricks Data + AI Summit in the coming weeks. Stop by the booth and we’ll walk you through it in person.

If you’re running analytics agents in production, or getting ready to, the context infrastructure is what determines whether they succeed. We’d like to show you what that looks like. Book a personalized demo and we’ll build a session around what matters most to you.

Your agents are ready. Give them something to work with.

—Shirshanka and Swaroop