How DataHub Integrates with Snowflake Horizon to Unify Metadata Across Your Entire Data Stack

How DataHub Integrates with Snowflake Horizon to Unify Metadata Across Your Entire Data Stack


DataHub is a Snowflake Premier Partner and Horizon Ecosystem Partner, trusted by over 500 Snowflake customers.

Badges identifying DataHub as Snowflake Ready Technology and AI Data Cloud Products Partner Premier

DataHub is proud to be part of the Snowflake Horizon ecosystem. Horizon is Snowflake’s built-in governance solution that unifies compliance, security, privacy, interoperability and access capabilities in the Snowflake Data Cloud. DataHub helps organizations extend that capability to every Kafka topic, every AI model, every dashboard, and every legacy system across the enterprise.

Originally developed at LinkedIn, and now powering metadata management for customers like Netflix, Figma, and Morningstar, DataHub is able to access the metadata in Snowflake Horizon, giving you unified governance across your entire data ecosystem.

What DataHub brings to Snowflake users

  • Improved Cortex AI agent performance with real-time metadata context available via the DataHub MCP Server
  • Comprehensive visibility and context across your entire data estate, including assets not on Snowflake
  • Optimization of your Snowflake environment by identifying unused or duplicative resources

Why DataHub, not traditional data catalogs?

Unlike legacy catalogs that take months to implement and only work in batch mode, DataHub delivers:

  • Real-time updates through stream processing 
  • A developer-first experience, with IDE integration 
  • An open-source core, preventing vendor lock-in 
  • 100+ connectors pre-built, and the ability to build your own connector 
  • More than 10x faster metadata ingestion from Snowflake than traditional catalogs

Why DataHub + Snowflake Horizon?

Snowflake Horizon delivers comprehensive compliance, security, privacy, and interoperability within the Snowflake Data Cloud. But here’s the reality: your data doesn’t live in just one place.

Your analysts query Snowflake tables that depend on Kafka streams. Your AI models consume data from Snowflake while also pulling from MongoDB. Your BI dashboards blend Snowflake views with real-time APIs.

DataHub bridges these worlds. It creates a unified metadata layer that spans from your source systems through Snowflake to your downstream applications, all while working within Horizon’s security and compliance framework.

Find any data asset in seconds

Ever had an analyst ask, “Where’s our customer churn data?” only to spend 30 minutes hunting through schemas? DataHub transforms how your teams discover data:

Natural language understanding

  • DataHub provides business users access to comprehensive metadata via natural language
  • Intelligent recommendations surface datasets that may be duplicated outside Snowflake

Cross-platform discovery

DataHub’s Snowflake connector automatically extracts and catalogs:

  • Complete database schemas with table and view definitions
  • Column-level metadata, including types, constraints, and descriptions
  • Dynamic tables and materialized views with refresh schedules
  • Real-time usage statistics showing hot tables and query patterns
  • Snowflake tags and classifications for enhanced context

DataHub’s unique event-based ingestion and bidirectional sync of Snowflake tags means that your Snowflake metadata is available within DataHub at the speed needed for AI applications.

DataHub uniquely captures rich Snowflake-specific intelligence

  • Complete table and view definitions with column-level detail
  • Dynamic tables with full metadata 
  • Stream definitions and change data capture metadata
  • Column-level lineage through query log analysis
  • Cross-account data sharing relationships
  • Apache Iceberg™ table support with versioning
  • Usage statistics and query patterns 
  • Data quality metrics and profiling statistics
  • Snowflake tags and classifications

But here’s what makes it powerful: DataHub shows these Snowflake assets alongside your Kafka topics, dbt models, and BI dashboards in one unified search experience.

Observe data quality across boundaries

Real-time data quality monitoring with Smart Assertions at DataHub

MYOB logo

“Since integrating DataHub into our workflow, even though our overall usage of Snowflake has gone up 4 times, DataHub has helped us significantly reduce the number of breaking changes, to the extent that they are no longer a burden on all teams.”

Asad Naveed, Engineering Manager, MYOB

Extend Horizon’s capabilities with real-time data monitoring that scales

You can’t fix what you can’t see. DataHub provides continuous observability for your Snowflake data and everything connected to it:

Automated quality checks

  • Monitor freshness: Did your daily sales data arrive in Snowflake on time?
  • Track volume anomalies: Why did yesterday’s load have 50% fewer records?
  • Detect schema drift: Did a column get renamed in an upstream system?
  • Validate business rules: Are all currency codes in your table valid?

DataHub’s Data Health Dashboards provide a birds-eye view of the health of your entire data landscape.

Stream-based architecture

DataHub’s real-time engine means changes propagate in seconds, not hours:

  • Column description updated? Reflected everywhere immediately
  • New sensitive data classification? Access controls update instantly
  • Quality issue detected? Get an alert before the dashboard breaks

Proactive incident prevention

Instead of hearing about data issues from angry executives, your team knows first. DataHub watches query patterns, usage trends, and quality metrics to spot problems before they cascade downstream.

Governance without borders

Tag propagation and AI Glossary Terms at DataHub

Unified policies, distributed enforcement

Governance breaks down at system boundaries. DataHub ensures your Snowflake Horizon policies extend everywhere your data travels:

Automated sensitive data protection

DataHub’s ML-powered scanners work alongside Horizon to:

  • Propagate Horizon classifications to related datasets automatically
  • Track data across the entire lineage graph—from before it enters Snowflake, all the way through all your BI tools.
  • DataHub also provides AI Glossary Term Classification to get started with metadata enrichment and data classification to enable end-users
  • Apply consistent privacy and data masking rules across platforms
  • Generate automated compliance reports spanning your entire ecosystem

DataHub provides the metadata foundation for unified governance

Governance workflows

Enable teams to act on classification insights:

  • Security teams can identify which reports need access reviews
  • Data engineers can see which pipelines process sensitive data
  • ML teams understand data handling requirements for their models

Secure metadata sharing

DataHub respects Snowflake’s security model:

  • Granular permissions control who sees what
  • Metadata flows without exposing sensitive content
  • Column-level security extends to discovery interfaces
  • Data residency requirements remain intact

Lineage that tells the complete story

DataHub’s cross-platform, column-level lineage graph

Understand impact before you make changes

That simple schema change in Snowflake? It might break 17 downstream dashboards. DataHub shows you the full picture:

End-to-end data flows

DataHub automatically builds lineage by:

  • Parsing Snowflake query logs for table and column dependencies
  • Tracking data movement from Kafka into Snowflake staging tables
  • Mapping transformations through dbt models and stored procedures
  • Connecting Snowflake views to BI dashboards and ML features
Miro logo

“From the initial events that capture user activity to the final Looker dashboards that make insights consumable for the business, DataHub provides detailed lineage and quality information critical for maintaining data reliability.”

— Ronald Angel, Data Products Manager, Miro

Impact analysis in real-time

Before making changes, understand consequences with lineage impact analysis:

  • Which executives rely on this deprecated table?
  • What happens if we change this column’s data type?
  • Who needs to approve before we delete this view?
  • Which SLAs are at risk if this pipeline fails?

Intelligent troubleshooting

When issues arise, lineage provides instant answers:

  • Root cause identification across system boundaries
  • Blast radius assessment for data quality issues
  • Change correlation to pinpoint what broke when
  • Ownership mapping to know who to call

Ready to unify your data governance?

The combination of DataHub and Snowflake Horizon offers something unprecedented: enterprise-grade governance that actually works across your entire data estate.

No more blind spots between systems. No more manual policy synchronization. No more discovering critical dependencies after they break.

For technical implementation details, visit our Snowflake connector documentation. For questions about DataHub Cloud capabilities, explore our complete feature guide.  

Take the next step

Recommended Next Reads