• Create extensible cross-source metadata catalog
  • Support custom entity types and ownership models
  • Reduce connector maintenance burden on platform team
  • Define custom metadata properties for governance compliance

The Topline

Challenge
Needed to evolve internal data catalog that placed too much burden on the central platform team as data needs expanded

Solution
Partnered with DataHub for its extensibility, adding custom entities, ownership models, and properties

Impact
Enabled self-serve metadata, stronger governance, and reduced reliance on central team

Note: This story was originally published January 2024.

Challenge

Netflix’s internal cataloging tool, Metacat, helped federate metadata within its Big Data Warehouse layer, but it was limited in scope. As Netflix’s data needs evolved, so did the requirements for a more comprehensive, cross-layer catalog spanning online stores, real-time pipelines, and analytics systems.

Two core issues emerged:

  1. The central Data Platform Team bore the burden of maintaining connectors, instead of the data-owning teams
  2. There was no policy engine in place to enforce governance policies centrally

“There was a need to evolve the product to become a self-serve platform to enable the relevant source system teams to define the asset or entity types, and start ingesting the data into the catalog.”

 — Ajoy Majumdar, Senior Staff Engineer, Netflix

Netflix required a solution that would support custom entity types, complex ownership structures, and privacy-driven custom properties aligned with regulatory requirements.

DataHub gave us the extensibility features we needed to define new entity types easily and augment existing ones. During our evaluation, we assessed both functional and nonfunctional aspects, and DataHub performed exceptionally well in managing our traffic load and data volume.

AJOY MAJUMDAR

Senior Staff Engineer, Netflix

Solution

Netflix selected DataHub after evaluating multiple metadata platforms. Their goal: find a solution that functioned not just as a data catalog, but as an extensible data platform.

DataHub’s extensibility was at the core of its appeal, supported by its robust scalability and feature set, developer experience, and community support.

Partnering with DataHub, Netflix began working on addressing its three foundational data catalog needs:

  1. Scope for new entity types: The unique nature of Netflix’s data ecosystem called for the creation of new entity types, unique to Netflix. For instance, a custom asset type to accommodate GraphQL schemas
  2. Custom ownership model: The evolving ownership models within Netflix’s datasets needed the creation of a custom ownership framework, enabling finer granularity and enhanced insights into data ownership
  3. Custom properties: To ensure alignment with privacy and legal standards, Netflix needed to define custom properties within the catalog. These properties, defined by Netflix’s privacy and legal teams based on specific glossaries relevant to their regulatory obligations, serve as guidelines for the terms under which data should be ingested into Netflix’s systems

“DataHub gave us the extensibility features we needed… and performed exceptionally well in managing our traffic load and data volume.”

 — Ajoy Majumdar, Senior Staff Engineer, Netflix

Impact

Netflix leverages DataHub’s extensibility to support its bespoke data ecosystem and governance needs.

Key outcomes included:

  • Improved productivity of central data team by enabling self-serve data cataloging across teams and  allowing source system owners to define and onboard metadata directly
  • Met the unique needs of Netflix through support for custom entity types, ownership types, and properties
  • Strengthened governance by defining custom metadata properties aligned with privacy and legal standards
  • Offloaded connector development, reducing reliance on the central Data Platform Team

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.