• Centralize metadata and eliminate documentation silos
  • Improve self-serve discovery across the organization
  • Enable proactive impact analysis and lineage tracking
  • Support governance and GDPR compliance at scale

We rely on DataHub to gain insights and ensure our critical data is reliable. DataHub’s managed product takes DataHub to the next level through automation and emphasis on time-to-value.

ADA DRAGINDA

Staff Data Engineer, Notion Labs, Inc.


The Topline

Challenge
Rapid growth led to data complexity with 2,000+ tables across multiple sources, making it difficult for business users to discover relevant data and distinguish valuable datasets

Solution
Implemented DataHub Cloud as a centralized data catalog to organize, tag, and document all data assets with governance capabilities and lineage tracking

Impact
Created a “one stop shop” that reduced data noise, shortened new hire onboarding cycles, enabled compliance processes, and prevented breaking changes through impact analysis

Note: This story was originally published October 2024.

Challenge

As Notion scaled from 1 million to 20 million users in just two years, its internal data footprint exploded with more than 2,000 tables across sources like Fivetran, Segment, and Census. The result was a noisy, inconsistent environment where even simple business questions became hard to answer.

“Users would ping a message in Slack, and hopefully someone would respond. Sometimes things would fall through the cracks—we didn’t have a good solution.” 

— Ada Draginda, Staff Data Engineer, Notion Labs, Inc.

With no organizing layer in place, users made assumptions about data, increasing the risk of using the wrong datasets in key analyses. This lack of clarity and control led Notion’s lean data team to seek a cataloging solution that could support governance, discovery, and self-serve access across the company.

Solution

Notion selected DataHub Cloud for its rich metadata model and extensibility. What started as a proof of concept for discovery and governance quickly scaled to support their entire company.

“Part of the reason I chose DataHub Cloud was because I knew we could make it as extensible as needed, especially around transformations.”

— Ada Draginda, Staff Data Engineer, Notion Labs, Inc.

The team centralized documentation in DataHub Cloud, eliminating duplication across Snowflake and dbt. By creating a “one stop shop” for understanding datasets, their locations, interactions, and dependencies, Notion improved overall data reliability and significantly reduced onboarding time for new data scientists.

To prevent breaking changes and improve impact analysis, the team relies on DataHub’s lineage capabilities. Previously, there was no formalized process for understanding the downstream effects of changes. With DataHub, the team can easily trace dependencies, assess impact, and take proactive steps before changes are deployed.

“DataHub Cloud is such a wonderful tool for lineage, especially since it can track through multiple hops – a few steps up or a few steps below. It’s the easiest place for us to see lineage.”

— Ada Draginda, Staff Data Engineer, Notion Labs, Inc.

DataHub also supports critical data governance workflows like tagging and tracking GDPR compliance through the business glossary. For PII columns, it enables actions such as masking, field removal, and rapid identification of data for deletion.

Impact

By implementing DataHub Cloud, Notion transformed its fragmented data environment into a centralized, reliable platform, unlocking stronger governance, faster onboarding, and safer collaboration across the company.

Key outcomes included:

  • Improved data reliability across the organization with centralized metadata available as a single source of truth, eliminating duplicative documentation across various data sources
  • Accelerated onboarding for new data scientists, providing a “one stop shop” for understanding datasets, their dependencies, and interactions
  • Reduced risk of breaking changes through robust data lineage and multi-hop impact analysis
  • Improved cross-team collaboration and self-serve access, empowering users to independently find and trust the right data without relying on tribal knowledge
  • Reduced risk of compliance issues with strengthened data governance and GDPR compliance through improved governance and PII workflows

“DataHub feels fairly intuitive. I learned it all without needing documentation. It’s a good product.” 

— Ada Draginda, Staff Data Engineer, Notion Labs, Inc.

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.