DataHub Cloud Updates

Chat with your AI data catalog

DataHub Cloud Updates

DataHub Cloud v0.3.16 has arrived. This update brings Ask DataHub into the DataHub platform, adds AI-assisted ingestion setup and troubleshooting, and delivers templating and SDK capabilities for deploying quality checks programmatically at scale.

Ask DataHub in DataHub

The problem it solves

Data discovery shouldn’t require memorizing catalog structure or mastering search syntax. Business users waste hours navigating complex interfaces while data teams field endless “where’s the right dataset for X?” questions. Meanwhile, making simple metadata updates such as changing an owner, fixing a description, or adding tags requires you to click through multiple screens. Now, you can chat with your entire data catalog, understand asset relationships, and make metadata changes through natural conversation without switching tools.

What it enables

  • Self-serve discovery through conversation. Ask questions in plain language to find datasets, understand downstream impact, or generate SQL without additional training
  • Focused conversations. Use the Chat area for comprehensive searches or open the sidebar for asset-specific questions 
  • Conversational metadata updates Ask DataHub to “update the description,” “change the owner to Sarah,” or “add the PII tag”. It handles the changes automatically

The details

Access Ask DataHub through a new Chat area for workspace-wide discovery or via the sidebar on any asset profile page. Search your full metadata graph and context graph, understand impact analysis, generate accurate SQL, and make changes to descriptions, owners, tags, and glossary terms through natural conversation.

What this means

Before this release, Ask DataHub lived in Slack and Teams. It was powerful, but separate from where users actually work with data in the catalog. Now, AI assistance meets users exactly where they are: exploring an asset profile, investigating lineage, or searching for the right dataset. Business users discover data without learning catalog mechanics. Data teams stop answering repetitive questions. Everyone moves faster.

Learn more about Ask DataHub in DataHub in our docs.

Context Documents

Why context matters

AI agents are only as good as the context they have. Ask DataHub can search your metadata graph, but it doesn’t know your deployment patterns, your team’s runbooks, or why certain datasets should never be joined together. That institutional knowledge lives in scattered Notion pages, Confluence spaces, and Slack threads that are largely invisible to AI. Now, you can give Ask DataHub the context it needs to provide precise, reliable answers for your entire organization.

What this unlocks

  • Capture institutional knowledge once. Create documents directly in DataHub for runbooks, deployment patterns, best practices, and other institutional knowledge
  • Make knowledge AI-searchable. Context Documents integrate with GraphQL APIs and MCP Servers, powering more accurate responses across your data ecosystem
  • Build a foundation for reliable AI. The more context you provide, the more precisely Ask DataHub can help everyone on your team

What’s new

Create Context Documents in DataHub to capture organizational knowledge. These documents become semantically searchable through GraphQL APIs and MCP Servers, dramatically improving AI agent response accuracy across your entire data stack.

Why this is a game-changer

Generic AI doesn’t know your organization’s specific context—that your “gold” certification means production-ready, that certain tables have known quality issues on Mondays, or that specific datasets require special handling for compliance. Context Documents transform Ask DataHub from a metadata search tool into an organizational knowledge expert that understands not just what your data is, but how your team actually uses it.

Learn more about Context Documents in our docs.

Ask DataHub for Ingestion

Why ingestion is hard

Setting up a new data source means wrestling with connection strings, authentication methods, configuration options, and cryptic error messages. When ingestion fails, you’re left parsing log files and consulting documentation that may not match your specific setup. Platform teams spend hours troubleshooting issues that could be resolved in minutes with the right guidance.

How Ask DataHub accelerates setup

  • Configure sources faster. Get step-by-step guidance for connection setup specific to your environment, not generic documentation
  • Understand errors instantly. Receive immediate interpretation of warnings and errors with actionable resolution steps
  • Learn advanced configurations naturally. Ask about performance tuning, incremental extraction, or custom filters—get recommendations based on your metadata goals

What’s included

AI-assisted chat is now embedded directly in the ingestion UI. Get immediate help configuring new sources, understanding configuration options, interpreting errors, and learning advanced tuning—all contextualized to your specific DataHub instance and data source.

Why this matters

Ingestion is the gateway to DataHub value. Every hour spent debugging Snowflake connections or troubleshooting Tableau permissions delays your team’s ability to discover and govern data. Ask DataHub for Ingestion removes the friction: setup is faster, errors are self-explanatory, and advanced configurations are accessible to anyone—not just the senior engineer who memorized the docs.

Learn more about Ask DataHub for Ingestion in our docs.

Templated SQL Assertions

Why static checks fall short

Data quality checks typically run against entire datasets, catching issues only after full pipeline completion. But what if you need to validate just the new records that loaded in the last hour? Or check specific date partitions before downstream jobs start? Static SQL assertions can’t target specific subsets at runtime.

What it enables

  • Circuit-break pipelines at the source. Validate only newly loaded data before it flows downstream, catching bad data immediately
  • Run critical checks on-demand. Target assertions against specific areas of interest without waiting for full pipeline completion
  • Build flexible quality patterns. One SQL template serves multiple scenarios with runtime parameter substitution—validate today’s partition, last hour’s records, or specific customer segments

How it works

SQL assertions now support runtime templating. Insert templated variables in SQL logic and substitute parameters during execution. Run assertions on-demand against specific data subsets, enabling sophisticated quality patterns like incremental validation, CI/CD branch checks, and conditional monitoring.

The impact

Traditional assertions operate on all-or-nothing logic: check the entire dataset or don’t check at all. This forces teams to choose between comprehensive quality coverage (slow, expensive) or targeted checks (manual, fragile). Templated SQL assertions eliminate this tradeoff. Validate exactly what matters when it matters, with the flexibility to adapt quality checks to your pipeline’s specific needs at runtime.

Learn more about Templated SQL Assertions in our docs.

Assertions Python SDK

The problem it solves

Creating assertions through the UI works great for dozens of datasets. But what about hundreds? Or thousands? Enterprise teams need consistent quality standards across similar datasets—every customer table needs freshness checks, every financial dataset needs volume validation, and every production table needs schema monitoring. Configuring this manually is impractical.

What you can do

  • Scale quality monitoring enterprise-wide. Define assertions programmatically and deploy them across hundreds or thousands of datasets
  • Ensure consistency through templates. One assertion template automatically applies appropriate checks to all matching datasets
  • Integrate with CI/CD workflows. Deploy quality definitions alongside data pipeline code with intelligent change detection that only creates assertions for new or modified assets

Under the hood

Manage all assertion types—column validation, freshness checks, volume monitors, schema assertions, and custom SQL metrics—through a simple Python SDK. Define assertion templates that apply automatically across datasets, with intelligent change detection that only creates assertions for new or changed assets on subsequent runs.

What this enables at scale

The UI made data quality accessible. The Python SDK makes it scalable. “Terraform for your assertions” means defining the quality checks you want once and letting automation ensure every dataset gets appropriate monitoring. This is how enterprises achieve comprehensive quality coverage without overwhelming their teams with manual configuration work.

Learn more about the Assertions Python SDK in our docs.

New and Enhanced Ingestion Sources

Keeping pace with your stack

Your data platform evolves constantly—new cloud services, modern orchestration tools, semantic layers that define business logic. DataHub grows alongside your infrastructure, ensuring comprehensive metadata coverage as your ecosystem expands.

Key capabilities

  • Connect cloud-native platforms. Connect with Azure Data Factory and Confluent Cloud using our out-of-the-box connectors
  • Capture semantic definitions. Enhanced support for dbt Semantic Layer and Snowflake Semantic Layer brings business logic into the catalog
  • Support modern orchestration. Airflow 3 compatibility ensures DataHub evolves with your pipeline infrastructure

What’s new

New ingestion sources for Azure Data Factory and Confluent Cloud. Enhanced existing sources for dbt Semantic Layer, Snowflake Semantic Layer, and Airflow 3 support.

What changes

Each new connector accelerates time-to-value by reducing custom integration effort. Enhanced semantic layer support means business definitions live alongside technical metadata, giving users the full context they need for confident decision-making. As your platform grows, DataHub’s coverage grows with it.

Let’s build together

We’re building DataHub Cloud in close partnership with our customers and community. Your feedback helps shape every release. Thank you for continuing to share it with us.

Join the DataHub open source community

Join our 14,000+ Slack community members to collaborate with the data practitioners who are shaping the future of data and AI.

See DataHub Cloud in action

Need context management that scales? Book a demo today.

Recommended Next Reads