How Uken Games Cut 100M Rows of Data Clutter

INDUSTRY

Video Game Developer

SIZE

100+ employees

DATA STACK

Redshift, BI tools, ECS Fargate Spot tasks, ETL pipelines

SOLUTION

DataHub Core (OSS)

USE CASE

Governance, Discovery

GOALS

Reduce technical debt
Improve data discoverability
Enable self-serve analytics
Eliminate documentation silos

See what DataHub Cloud can do for your team

Meet With Us

Using DataHub to kickstart spring (data) cleaning has delivered a lot of value for Uken so far.

LISA STEPHENS

Data Scientist, Uken Games

The Topline

Challenge
Growing technical debt from ad hoc datasets and scrappy pipelines, creating confusion about authoritative data sources and hindering self-serve analytics

Solution
Implemented DataHub with systematic glossary terms, usage statistics monitoring, and comprehensive documentation linking

Impact
Identified 40% of tables for cleanup, enabled faster time-to-analysis, and reduced risk of inaccurate analysis

Note: This story was originally published January 2023.

Challenge

Uken Games, a Toronto-based game studio developing titles including Solitaire Story: Ava’s Manor, Who Wants to Be a Millionaire?, and Jeopardy! World Tour faced growing technical debt from years of ad hoc datasets and short-term, scrappy pipelines.

As Lisa Stephens, Data Scientist at Uken Games, explained: “We pull ad hoc data sets for one-off analyses, spin up short-term, scrappy data pipelines while more robust automation is in development, and more. We build up our own version of clutter.”

This technical debt led to several critical pain points:

Confusion about authoritative data sources, making it hard to identify trusted datasets
Knowledge silos, where understanding column or table meanings requires code diving or Slack sleuthing
Poor discoverability, due to an overwhelming data lake

Without a central source of truth, teams lacked answers to essential questions about cleanliness, granularity, time coverage, and context. This frustrated attempts at self-serve analytics and increased reliance on gatekeepers.

“Eventually, this creates a layer of technical debt … that dissuades people from self-serve analytics.”

— Lisa Stephens, Data Scientist, Uken Games

Solution

Uken Games implemented DataHub as their central data catalog and governance platform, running their entire setup on ECS Fargate Spot tasks to optimize costs.

Their implementation focused on three key areas:

Systematic Glossary Terms framework

Using DataHub’s Glossary Terms feature, the team created a tagging system to capture and standardize metadata across datasets, including:

Quality level (bronze/silver/gold/iced)
Retention period (rolling date range, if applicable)
Granularity (row-per-X)
Data source (internal/third party)

This created filterable search capabilities and hierarchical term grouping, enabling users to easily answer questions like, “Where can I find a gold quality user-day dataset suitable for constructing a quarterly report about the in-game economy?”

Query Statistics to identify low-usage data

By leveraging DataHub’s usage data, the team identified low-usage data sets that were underutilized or abandoned. This insight guided systematic cleanup efforts, reducing storage costs and minimizing the risk of decisions based on stale or incorrect data.

“When we recently used this feature to evaluate a family of tables that had historically been used by our data science and product management teams, we found that as many as 40% of them, amounting to approximately 100 million rows, were no longer needed.

— Lisa Stephens, Data Scientist, Uken Games

Custom Metadata to link out to existing documentation

Uken used DataHub’s entity-level About sections to link out to existing documentation, allowing end users to more easily find the missing context they’re looking for. Depending on the nature of the table in question, the team includes:

A brief summary of what the table contains
If the table comes in variants of different cleanliness or granularity, references to what those variants are, and a summary of how they differ
If the table has been deprecated, a reference to the newer version that should be used
Hyperlinks out to the original spec, analysis, or monitoring dashboard
Hyperlinks out to the corresponding ETL job

Impact

With DataHub, Uken Games built a cleaner, more navigable, and self-serve-friendly data ecosystem.

Key outcomes included:

Identified 40% of tables for cleanup, uncovering ~100M rows of unused data and reducing storage costs
Improved data discovery through glossary-driven filtering by quality, granularity, and retention
Accelerated time-to-analysis by enabling teams to find and trust data independently
Eliminated documentation silos by linking data assets directly to existing specs, analyses, ETL jobs, and dashboards without requiring content migration
Reduced dependency on institutional knowledge by centralizing metadata and context within a single, searchable system

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.

Uken Games Reduces Infrastructure Waste by 40%