Zynga Levels Up Data Management with DataHub

INDUSTRY

Video Game Developer

SIZE

3,000+ employees

DATA STACK

Redshift, Airflow, Tableau, Databricks, Spark, Kubernetes, AWS, Kafka, MySQL, Splunk, Datadog

SOLUTION

DataHub Core (OSS)

USE CASE

Discovery, Lineage, Observability

GOALS

Centralize metadata across vast data ecosystem
Improve data lineage and dependency tracing
Monitor data quality in real time
Streamline operations

See what DataHub Cloud can do for your team

Meet With Us

The Topline

Challenge
With 100+ games generating massive data volumes, Zynga struggled with data silos, unclear dependencies, and fragmented metadata across their complex tech stack

Solution
Implemented DataHub as their centralized data catalog, customizing it to support gaming-specific needs like A/B testing experiments

Impact
Enabled 300+ data practitioners to work more efficiently and confidently, with full visibility into lineage, data quality, and dependencies across 100,000+ assets

Note: This story was originally published September 2023.

Challenge

As Zynga’s portfolio expanded to over 100 games, so did the complexity of its data operations. The company processes more than 35 billion records daily, ingests around 66 terabytes of data, and maintains 1.5 petabytes of queryable data. This ecosystem supports over 300 data analysts, engineers, and PMs running 100,000+ queries and generating 4,000 reports each day.

But without centralized metadata, teams struggled to answer fundamental questions:

“We needed a tool to help us know our data better and answer questions, such as, ‘Where does the data from this report come from?’ ‘What experiment will be affected if I change my dataset?’ ‘When was this dataset last updated?’ ‘What are the dependencies between datasets and jobs?’”

— Felipe Gusmao, Data Engineer, Zynga

Despite having a robust tech stack including Redshift, Airflow, Databricks, Tableau, and Kubernetes, Zynga identified the need for a unified solution to address data discovery, reliability, and operational efficiency at scale.

Solution

Zynga selected DataHub as the backbone of their metadata strategy, integrating it across their entire tech stack. The implementation spanned 100,000+ ingested assets from Redshift, Airflow, Tableau, Databricks, and more.

Key implementation details:

Custom code: Modified ingestion pipelines for tools like Redshift and Airflow to accommodate differences in SQL syntax and workflow structures
New entities: Introduced an “Experiments” entity to capture metadata for A/B testing and experiments, a critical aspect of game development
Deployment architecture: Used managed AWS services for Kafka, MySQL, cache, and search, with DataHub’s frontend, GMS, and Schema Registry deployed via Kubernetes using Helm charts
Monitoring: Integrated logs with Splunk and monitoring through Datadog for system reliability

We looked around for a data catalog tool, and DataHub was a clear winner. We created a small POC to evaluate its capabilities and saw a huge potential for it to be much more than a data catalog.

FELIPE GUSMAO

Data Engineer, Zynga

Impact

With DataHub, Zynga established a robust, reliable metadata foundation and made “fully data-driven” a reality for its global data teams.

Key outcomes included:

Centralized metadata management across 100,000+ ingested assets, establishing a single source of truth for their massive data ecosystem
Enhanced impact assessment with comprehensive data lineage enabling teams to trace dependencies between datasets, jobs, and reports, answering questions like, “What experiment will be impacted by changing this dataset?”
Reduced data incidents with real-time data quality monitoring using assertion-based validation that track dataset health across 35 billion daily records and flag issues proactively
Streamlined operational troubleshooting with visibility into Airflow DAG status insights, allowing teams to pinpoint causes of issues like delayed dashboard refreshes
Optimized infrastructure costs by identifying and deprecating unused datasets across vast data landscape

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.

Zynga Levels Up Data Management