Customer Stories / Netflix

Netflix Reimagines Discovery and Governance at Scale


  • Unify discovery across data, ML, and software assets
  • Scale onboarding beyond siloed institutional knowledge
  • Enable cross-domain impact analysis and proactive incident prevention
  • Establish self-serve governance spanning all technical assets
  • Build context foundation for future agentic AI use cases

“DataHub has become the central nervous system for discovery and governance at Netflix. As we’ve expanded into ads, live events, and games, our data ecosystem grew exponentially complex. DataHub gave us something we desperately needed—a unified view across all our technical assets, whether that’s data, ML models, or software services.
 
Now, our data teams can discover what they need, understand the full impact of changes before they make them, and self-serve governance questions that used to require deep legacy knowledge. It’s transformed how we operate at scale, and it’s positioning us perfectly for the agentic AI future we’re building toward.”

Nitin Sarma

Sr. Engineering Manager, Data Discovery, Governance & Experiences at Netflix

The Topline

Challenge
Expansion into new verticals created exponential data growth, fragmented discovery, siloed onboarding, and complex impact analysis

Solution
Built a unified global catalog on DataHub to connect all data, ML, and software entities

Impact
Achieved unified discovery, cross-domain impact analysis, self-serve governance, and a scalable foundation for agentic AI

Challenge

Netflix’s rapid expansion into ads, live events, and games created exponential data growth. As data practitioners and pipelines multiplied across the company, three critical problems emerged:

1. Fragmented discovery at scale 

New data practitioners joining Netflix faced a daunting question: where do I find the data I need? 

With datasets scattered across new and historical data sources, discovery relied heavily on institutional knowledge. Engineers turned to colleagues, Slack threads, and documentation to understand what data existed, where to find it, and whether it was reliable. This one-to-one knowledge-sharing approach couldn’t keep pace with Netflix’s growth.

2. Governance gaps across systems 

With thousands of tables in production, accountability remained unclear. Who owns this table? What data classification does it carry? Who can and cannot access it? 

As teams reorganized and evolved, ownership became harder to track. Cost attribution, compliance requirements, and data quality standards all suffered from this fragmentation. Teams needed self-serve governance that didn’t require deep legacy knowledge of Netflix’s data architecture.

3. Complex and siloed impact analysis 

Engineers making changes to tables had no way to visualize downstream effects. Which pipelines would break? Which models would be impacted? 

With GenAI systems becoming first-class citizens at Netflix, understanding these cascading effects grew more critical—and more challenging. The fragmentation across data sources meant no one had a bird’s-eye view of dependencies across the company.

Netflix realized these weren’t just data problems. Software practitioners faced the same challenges with APIs and services. ML practitioners struggled to understand how models were used and what data trained them. 

The company needed a fundamental shift: from siloed, team-specific thinking to a unified, global approach that served all technical practitioners.

“Our legacy way of thinking and organizing data and information is no longer enough. We need to have a more cohesive, centralized way of how the context is stored, where it is stored, and how we reason about it in a more holistic way.”

– Nitin Sarma, Sr. Engineering Manager, Data Discovery, Governance & Experiences at Netflix

Solution

Netflix chose DataHub as the foundation for a global catalog that spans data, ML, and software entities across the company.

Four standout capabilities made DataHub the clear choice for Netflix:

  1. DataHub allows entities and relationships to be modeled as first-class citizens. This was a critical deciding factor for a company where relationships had historically been an afterthought. 
  2. DataHub’s unified ingestion framework meant Netflix didn’t need to build metadata collection from scratch. 
  3. Data lineage comes naturally because relationships are modeled explicitly in DataHub. Anyone using the system gets dependency views automatically. 
  4. DataHub enables coverage insights that answer questions like “what percentage of our datasets have proper classification?” These insights translate directly into governance improvements.

“DataHub already has a fantastic way of modeling relationships as first-class citizens, ingesting metadata, surfacing lineage, and tracking coverage—functionalities that are very useful for the system that we’re trying to build.”

– Nitin Sarma, Sr. Engineering Manager, Data Discovery, Governance & Experiences at Netflix

With DataHub as the foundation, Netflix now models anything that represents a technical asset in its unified catalog, including: 

  • Tables, workflows, pipelines, datasets, and queries for data practitioners
  • Services, endpoints, and ownership areas for software
  • Forward looking: Models, inference layers, feature generation pipelines, and training infrastructure for ML

To balance centralized governance with team autonomy, Netflix implemented GraphQL federation. This prevents the catalog from becoming a bottleneck while maintaining unified discovery. Teams can move quickly with domain-specific metadata while the core catalog provides universal context.

Impact

With DataHub as its foundation, Netflix built a global catalog that transformed how practitioners discover, govern, and manage technical assets.

Key outcomes included:

  • Eliminated support bottlenecks with self-service search. Complex queries across ownership, classification, and entity types that previously required SQL expertise or support tickets now work as self-service.
  • Faster onboarding and productivity. New engineers discover relevant datasets across domains and understand dependencies without relying on legacy knowledge.
  • Visibility into governance posture. Netflix continuously monitors what percentage of entities lack PII classification, sets baseline governance targets, and identifies gaps across the ecosystem.
  • Automated retention and cleanup. TTL policies trigger automatic alerts and dataset purges based on environment and sensitivity, preventing compliance violations while eliminating storage costs from forgotten test tables.
  • Unified cost ownership visibility. Teams view cumulative costs across systems, data, and ML in one dashboard with month-over-month trends, eliminating spreadsheet reconciliation and driving more teams to adopt the catalog.
  • Accelerated migrations. System-wide upgrades and ownership transfers that once required tracking in multiple spreadsheets now happen seamlessly with full visibility across all affected entities.
  • AI-ready context infrastructure. The global catalog built for practitioners serves dual purpose as the context foundation for AI agents, positioning Netflix for future agentic workflows.

“Overall, it’s been quite the journey for us with DataHub, and we’re really looking forward for more and more to be built on top of it.”

— Nitin Sarma, Sr. Engineering Manager, Data Discovery, Governance & Experiences at Netflix

Future-proof your data catalog

DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

Explore DataHub Cloud

Take a self-guided product tour to see DataHub Cloud in action.

Join the DataHub open source community 

Join our 14,000+ community members to collaborate with the data practitioners who are shaping the future of data and AI.