• Centralize metadata management
  • Enable self-serve data discovery
  • Extend metadata management to support custom data ecosystem 

The Topline

Challenge
Data consumers couldn’t discover relevant tables or understand column meanings; data producers lacked tools to govern and enrich metadata

Solution
Adopted DataHub as a metadata platform, extending it with custom Thrift ingestion and “Data Elements” typing system

Impact
Streamlined data discovery and scalable metadata management across a bespoke data ecosystem

Note: This story was originally published November 2022.

Challenge

Pinterest’s data ecosystem had grown to over 250,000 Hive tables and 100,000 Thrift structs and enums, outpacing the capabilities of its existing metadata workflows.

The existing infrastructure relied on a Thrift schema repository that was compiled daily into JAR files for their Presto and Spark engines. While functional, this approach was rigid, inconvenient, and lacked support for metadata enrichment.

This created pain on both ends of the data lifecycle:

  • Data consumers struggled to identify the right tables or understand what columns represented
  • Data producers had no clear, centralized location to add or manage metadata

“For the data consumers, people didn’t know which table should be used and what the column means. And for the data producers, they needed a governance tool. They needed to add metadata, but they didn’t know where to add it.”

– Zhong Xu, Software Engineer, Pinterest.

Solution

Pinterest selected DataHub for its extensible metadata model, modern UI, granular access control, and strong support for custom metadata. 

The team extended the platform in two key areas:

  1. Custom Thrift ingestion: Rather than ingesting from generated artifacts, Pinterest built a custom integration to ingest raw Thrift files directly, preserving critical metadata
  2. “Data Elements” typing system: To better represent logical business entities across datasets, the team developed a custom “Data Elements” typing system. Thanks to DataHub’s extensible architecture, integrating this custom system was as simple as adding a new field in the editable schema field info

Thanks to DataHub’s extensibility, we could easily build our Thrift model… and extend the DataHub model to support our Data Elements.

ZHONG XU

Software Engineer, Pinterest

Impact

By extending DataHub to fit their unique environment, Pinterest delivered a more powerful, user-friendly, and scalable data discovery experience.

Key outcomes included:

  • Accelerated custom integration using DataHub’s extensible architecture
  • Accelerated use of data by streamlining data discovery through a centralized, intuitive UI
  • Empowered data producers with the ability to manage metadata at the source

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.