Introducing DataHub Cloud v1.1.0

More than 40% of analytics agents fail in production. Not because the models are wrong, but because the context they rely on is. They hallucinate joins, contradict each other across teams, and confidently answer the wrong question with the wrong table.

DataHub Cloud v1.1.0 introduces the Context Platform: three capabilities that tackle the hardest parts of getting data AI-ready, moving teams from months of context workshops to days of automated, validated, agent-ready definitions. Data platform leaders, domain experts, and analytics engineers can now build toward AI-ready data without starting from a blank slate.

Let’s take a look at what’s new at DataHub.

Introducing the Context Platform for improving AI agent answers

1. Context Intelligence

What’s new

DataHub Cloud provides a Context Curator agent that automatically scans technical metadata like audit log queries, BI dashboards, and more to generate a no-touch semantic layer with context documents that can be used by agents to navigate the enterprise landscape. Each context document captures the natural-language questions a table answers, the metrics it computes, the join patterns analysts already use, and the filters that matter, packaged in a format optimized for LLM consumption. Documents are refreshed daily, scoped by domain and ranked by query frequency.

Why it matters

Documenting context manually takes 16+ hours per business-critical table. At 500 tables, that’s years of effort, before accounting for the fact that data changes constantly. Previously, context projects stalled before they started because aligning on semantic definitions took months. Now, DataHub converts years of existing analyst work into validated metric definitions and proven join patterns in days, not months.

Here’s what it enables
  • Bootstrap context at scale: generate context for hundreds of tables from your existing query history, with no manual workshops.
  • Improve text-to-SQL accuracy: make semantic context easily discoverable via semantic search to ground production agents in proven query patterns instead of raw schema, with context that refreshes as query patterns evolve. 
  • Reduce inference cost: deliver pre-validated, compact context documents to agents instead of raw schema dumps, cutting token spend on every query.
DataHub Context Intelligence interface showing a semantic anchor document titled "Order volume, customer activity, and order timing metrics" with business questions, metric calculations, and related dataset links.
Context Intelligence

Context Intelligence is available in Private Beta.

2. Context Hub

What’s new

Context Hub is a dedicated workspace for domain experts to validate, enrich, and keep AI-generated context trusted and drift-free. It’s a streamlined collaboration between humans and agents. It organizes everything SMEs need into three views:

  • Proposals (review new context, edits, and comments)
  • Documents (browse the full context library)
  • Evals (monitor context quality continuously)

Auto-generated context lands in draft state by default. Domain experts review, comment, and approve before agents ever see it.

Why it matters

Auto-generated context is a starting point, not a finish line. Without a structured review process, domain experts can’t validate at scale, and context quality silently drifts. Agents act on whatever they find, correct or not. Previously, there was no structured surface for SMEs to confirm and refine definitions before publishing. Now, Context Hub closes both loops: fast validation before publishing, and continuous evals that catch drift before it reaches production.

Here’s what it enables
  • Scale SME validation without scaling SME time: review context in manageable batches routed by domain.
  • Catch regressions before production: simulate context changes before approving so failures stay in review, not production.
Screenshot of DataHub's Context Hub showing a split-panel view. The left panel displays a semantic anchor document titled "Order fulfillment and return timing analysis," including business questions and SQL-based metric definitions for avg_days_in_transit and avg_days_to_return. The right panel shows an AI assistant responding to the question "What is the average number of days from dispatch to return for returned items?" by retrieving and ranking the top 3 relevant semantic anchor documents with relevance scores.
Context Hub

Context Hub is available in Private Beta.

3. Context Activation

What’s new

Trusted context is only valuable if agents can reach it. Context Activation connects DataHub’s governed context layer to Snowflake Cortex, Databricks Genie, Claude, Cursor, and custom LangChain agents through MCP servers, the Agent Context Kit, skills, API, and SDK. Because DataHub delivers pre-validated context rather than raw schema, agents use significantly fewer tokens per query and inference costs decrease as usage scales.

Why it matters

Today, most enterprise context sits in systems that were originally built for humans. DataHub unlocks that context, publishing the most important parts of it in a single shared memory easily accessible to your AI agents, wherever they live.

Here’s what it enables
  • Accessible to any agent: Instantly connect Snowflake Cortex, Databricks Genie, Claude, Cursor, ChatGPT, and custom LangChain agents to DataHub Context.
  • Built for your workflow: Easily integrate into your existing data analytics, data governance, data quality workflows and processes.
DataHub Context Activation showing an AI assistant generating SQL from a natural language question, grounded in the order_items table with a plain-language explanation of what the query does and key assumptions to verify.
Context Activation

Context Activation is available in Private Beta.

Assertion Severity

What’s new

Assertion failures now include a severity classification that indicates how significant the failure is. Thresholds can be manually configured through Severity Assignment Rules, or automatically determined by DataHub using signals like deviation from expected bounds, asset importance, and potential downstream impact.

Why it matters

Data engineering teams receive multiple assertion failure alerts every day. Previously, every failure looked the same regardless of impact. Now teams can immediately differentiate between a critical data deviation and a temporary blip without digging into the details.

Here’s what it enables

  • Triage faster from your Data Health dashboard: differentiate low-priority issues from critical failures directly in your dashboard or Slack notifications.
  • Take control of severity thresholds: set manual thresholds based on your knowledge of your tables, and automatically raise incidents with matching severities.
  • Centralize issue tracking across all assertion sources: report severity on third-party assertions like dbt tests alongside native DataHub assertions.

Learn more about Assertion Severity in our docs and SDK docs.


New metadata ingestion sources

v1.1.0 adds native connectors for seven new data platforms, expanding DataHub’s metadata coverage across the modern data stack.

New sources: Airbyte, dltHub, Matillion DPC, Informatica Cloud (IDMC), ThoughtSpot, TimescaleDB, and SAP HANA.

Trusted context is only as good as the metadata feeding it. Every new connector expands the context graph that agents, analysts, and governance teams rely on, bringing lineage, usage, and business meaning from more of the platforms your organization actually runs.

Tool CategoryConnectorWhat DataHub Captures
ETL & pipeline toolsAirbyte, dltHub, Matillion DPC, Informatica CloudEnd-to-end lineage across ingestion and transformation pipelines, from source systems through to destination tables, with orchestration metadata and execution history.
BI & analyticsThoughtSpotReport and dashboard lineage traced back to source warehouses, with usage signals that surface which assets and metrics teams actually rely on.
Databases & time-seriesTimescaleDB, SAP HANATable, view, and procedure-level metadata with column lineage parsed from native query and transformation logic, plus query usage to identify active assets.
Context Ingestion

Explore all integrations in our docs.


Backwards compatibility

v1.1.0 includes a meaningful set of breaking changes inherited from the underlying OSS v1.6.0 release, the most significant in several releases. Most are surgical and well-scoped, but we recommend reviewing before upgrading.

  • UI: The V1 UI has been removed. V2 is now the only supported interface.
  • Auth and API contracts: Search filter API shape updated; Kafka offset commit behavior changed for high-throughput configurations. Affects custom integrations and non-idempotent Actions pipelines.
  • Ingestion, asset identity: URN corrections for Athena, Fivetran, Sigma, stored procedures, and deep-path SQL sources (e.g. Dremio). Assets in affected sources may need re-ingestion; metadata attached to old URNs (tags, owners, glossary terms, lineage) will need to be reattached.
  • Ingestion, recipe config: SQL profiling default changed; Unity Catalog ownership and properties now overwrite by default; several connector config fields renamed or removed (Dataplex, BigQuery policy tags, dbt assertion types).

Let’s build together

We’re building DataHub Cloud in close partnership with our customers and community. Your feedback helps shape every release. Thank you for continuing to share it with us.

Want to learn more?

Future-proof your data catalog

DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

Explore DataHub Cloud

Join a live group demo to see DataHub Cloud in action.

Join the DataHub open source community 

Join our 15,000+ community members to collaborate with the data practitioners who are shaping the future of data and AI.