Highlights from DataHub’s June 2025 Town Hall
At our June 2025 Town Hall, we shared the next phase of DataHub’s evolution and what it means for teams building AI-native data platforms.
The session centered around a major shift in how data gets used. It’s no longer just humans trying to find and trust the right datasets. AI agents, like the ones powering tools such as Snowflake Cortex and Databricks Genie, are entering the mix. However, they’re hitting the same roadblocks humans have always faced: incomplete context, unclear ownership, and unreliable signals.
This recap walks through what we announced, what we demoed, and how DataHub is becoming the foundation for AI-native metadata operations.
AI agents are a data discovery powerhouse—if they have the right context
AI systems can parse natural language and return results fast, but they often fail silently when context is missing. In one demo, a large language model returned an impossible mobile app conversion rate (over 100%) because it misunderstood the meaning of the metric. These kinds of errors don’t just confuse; they break trust.
The fix? Give agents the same metadata context that humans rely on to evaluate data.
Watch Founding Engineer, Gabe Lyons, explain the need for AI-native context.
AI gets a direct line to your data with the DataHub MCP Server
The DataHub Model Context Protocol (MCP) Server, now available as a Managed MCP Server on DataHub Cloud v0.3.12+ and Self-Hosted MCP Server on DataHub Core, gives AI agents a direct line to your metadata graph.
It enables programmatic access to:
- Metadata relationships (lineage, ownership, glossary, etc.)
- Relevance and trust signals (usage frequency, documentation, endorsements)
- The same real-time context graph that powers DataHub’s human UI
With this context, complex data discovery becomes intuitive for agents. They can select the right assets, explain why they made that choice, and flag risks or ambiguity upfront.
Founding Engineer, Harshal Sheth, describes the DataHub MCP Server as “a tool registry and a tool executor.” Watch him explain how DataHub’s MCP Server works.
Recommended reading: Learn how Block uses the DataHub MCP Server to power agentic AI data discovery.
What we’ve learned from humans still applies
As we built the MCP Server, we revisited what makes metadata usable for humans. We found that the same lessons apply to machines:
1. Relevance matters.
A search for “customer” might return thousands of results. Without clear relevance signals, agents (and humans) waste time sorting through noise.
2. Context must be centralized.
Distributed context scattered across warehouses, pipelines, tools, and dashboards slows everyone down. For agents to reason effectively, they need a unified view.
3. Data is everywhere—context must be, too.
Data doesn’t live in one place. It spans warehouses, pipelines, BI tools, ML features, dashboards, APIs, and more. If agents are going to operate effectively across the modern data stack, they need end-to-end visibility into that entire ecosystem.
Live demo: AI-powered data discovery in action
To show what this looks like in practice, we demoed Claude Desktop running against DataHub via the MCP server.
The use case: “What are the best tables to analyze customer lifetime value?”
Here’s what happened:
- Claude hit the MCP server to search for relevant tables.
- It reviewed results, evaluating trust signals, documentation, and other important context that DataHub gathered from multiple sources.
- It returned a detailed summary report.
No dashboards. No guesswork. Just metadata-driven decision making at machine speed.
See it in action.
What’s next for DataHub: 2025 roadmap update
Our product roadmap is grounded in four core areas: discovery, governance, observability, and our metadata graph. Each plays a critical role in enabling reliable, context-aware AI systems and data operations at scale.
Discovery: Building context-aware data exploration
Our 2025 discovery initiatives focus on three core areas: capturing and leveraging human context around data for richer insights, enabling intelligent, context-driven exploration and navigation, and continuously expanding the completeness and reach of our lineage graph.
Watch Founding Product Manager, Maggie Hays, break down our key focuses for data discovery in 2025.
Planned advancements include:
- New data sources: Integrations with Snowplow, Rudderstack, and Azure Data Lake to broaden our out-of-the-box support for data tooling.
- Iceberg Rest Catalog integration (shipped!): Supports federated data creation.
- Iceberg Rest Catalog integration (shipped!): Supports federated data creation.
- Hierarchical lineage: Will enable users to traverse complex lineage graphs with a high-level summary view of lineage relationships.
- Metrics catalog: Will act as a central registry of metric definitions for users to register, associate, document, and discover.
Governance: Building a central compliance command center
Our current focus areas in data governance will establish DataHub as a universal data registry with visibility into every asset across your stack, implementing centralized compliance initiatives and enabling policy enforcement through integrations with partner tools for cross-platform compliance.
Watch Maggie explain our key focuses for data governance in 2025.
On the roadmap:
- Tag and glossary term sync actions: Bi-directional actions to propagate tags and glossary terms from DataHub back to source data platforms and vice versa.
- Parent-child assets (logical datasets): Will enable users to manage metadata for multiple physical assets in one place through parent-child relationships.
Observability: Building contextual quality insights for every user
Our 2025 observability initiatives focus on democratizing data quality for all users, enabling collaborative incident management through centralized tracking and resolution, and ensuring quality insights are presented alongside relevant business context.
Watch Maggie walk through our key focuses for data observability in 2025.
We’ve already shipped two key observability initiatives:
- Assertions usability (shipped!): Allows users to easily search, filter, and group data quality checks, and view historical context, at scale.
- Enriched incidents (shipped!): Users get a detailed view to manage incident priority, stage, assignees, and view incident activity.
Metadata graph: Building automation-ready infrastructure with robust monitoring
Our 2025 platform improvements focus on enhancing APIs and SDKs for automated metadata operations, optimizing performance across ingestion and user experiences, and implementing improvements to audit logging and tracing.
Watch Maggie walk through our key focuses for our platform in 2025.
Recent releases and planned improvements include:
- Python SDK v2 (shipped!): High-level APIs for registering, enriching, and retrieving data assets in DataHub. Provides a simplified experience for programmatically defining relationships between entities.
- Search & Dataset SDK and Lineage SDK are now live in production.
- AI SDK (up next): Will include functions tailored towards AI-driven use cases and managing AI assets.
- Quickstart uplift: Updated quickstart package will provide improved stability and performance to reduce build time and system resources. This will be a breaking change.
- Service accounts: Will provide the ability to create and manage service users for programmatic workflows and custom automations.
The bottom line: Human and machine users alike can’t act without context
The metadata catalog era was about helping humans find the right data. The context engine era is about enabling both humans and machines to do it safely, intelligently, and at scale.
With the MCP server now available, your agents don’t have to guess. They can reason.
There’s more where that came from
Watch the full June Town Hall
Catch the full session and demos on demand.
Explore the MCP Server docs
Learn how to get started with our fully managed experience on DataHub Cloud or self-hosted on DataHub Core.
Join our open source community
Want to build with us? Start by joining the DataHub Community Slack—a space for open source contributors, data practitioners, and anyone looking to shape the future of metadata-powered AI.