
INDUSTRY
SIZE
DATA STACK
SOLUTION
USE CASE
GOALS
- Reduce technical debt
- Improve data discoverability
- Enable self-serve analytics
- Eliminate documentation silos

Using DataHub to kickstart spring (data) cleaning has delivered a lot of value for Uken so far.
LISA STEPHENS
Data Scientist, Uken Games
The Topline
Challenge
Growing technical debt from ad hoc datasets and scrappy pipelines, creating confusion about authoritative data sources and hindering self-serve analytics
Solution
Implemented DataHub with systematic glossary terms, usage statistics monitoring, and comprehensive documentation linking
Impact
Identified 40% of tables for cleanup, enabled faster time-to-analysis, and reduced risk of inaccurate analysis
Note: This story was originally published January 2023.
Challenge
Uken Games, a Toronto-based game studio developing titles including Solitaire Story: Ava’s Manor, Who Wants to Be a Millionaire?, and Jeopardy! World Tour faced growing technical debt from years of ad hoc datasets and short-term, scrappy pipelines.
As Lisa Stephens, Data Scientist at Uken Games, explained: “We pull ad hoc data sets for one-off analyses, spin up short-term, scrappy data pipelines while more robust automation is in development, and more. We build up our own version of clutter.”
This technical debt led to several critical pain points:
- Confusion about authoritative data sources, making it hard to identify trusted datasets
- Knowledge silos, where understanding column or table meanings requires code diving or Slack sleuthing
- Poor discoverability, due to an overwhelming data lake
Without a central source of truth, teams lacked answers to essential questions about cleanliness, granularity, time coverage, and context. This frustrated attempts at self-serve analytics and increased reliance on gatekeepers.
“Eventually, this creates a layer of technical debt … that dissuades people from self-serve analytics.”
— Lisa Stephens, Data Scientist, Uken Games
Solution
Uken Games implemented DataHub as their central data catalog and governance platform, running their entire setup on ECS Fargate Spot tasks to optimize costs.
Their implementation focused on three key areas:
- Systematic Glossary Terms framework
Using DataHub’s Glossary Terms feature, the team created a tagging system to capture and standardize metadata across datasets, including:
- Quality level (bronze/silver/gold/iced)
- Retention period (rolling date range, if applicable)
- Granularity (row-per-X)
- Data source (internal/third party)
This created filterable search capabilities and hierarchical term grouping, enabling users to easily answer questions like, “Where can I find a gold quality user-day dataset suitable for constructing a quarterly report about the in-game economy?”
- Query Statistics to identify low-usage data
By leveraging DataHub’s usage data, the team identified low-usage data sets that were underutilized or abandoned. This insight guided systematic cleanup efforts, reducing storage costs and minimizing the risk of decisions based on stale or incorrect data.
“When we recently used this feature to evaluate a family of tables that had historically been used by our data science and product management teams, we found that as many as 40% of them, amounting to approximately 100 million rows, were no longer needed.
— Lisa Stephens, Data Scientist, Uken Games
- Custom Metadata to link out to existing documentation
Uken used DataHub’s entity-level About sections to link out to existing documentation, allowing end users to more easily find the missing context they’re looking for. Depending on the nature of the table in question, the team includes:
- A brief summary of what the table contains
- If the table comes in variants of different cleanliness or granularity, references to what those variants are, and a summary of how they differ
- If the table has been deprecated, a reference to the newer version that should be used
- Hyperlinks out to the original spec, analysis, or monitoring dashboard
- Hyperlinks out to the corresponding ETL job
Impact
With DataHub, Uken Games built a cleaner, more navigable, and self-serve-friendly data ecosystem.
Key outcomes included:
- Identified 40% of tables for cleanup, uncovering ~100M rows of unused data and reducing storage costs
- Improved data discovery through glossary-driven filtering by quality, granularity, and retention
- Accelerated time-to-analysis by enabling teams to find and trust data independently
- Eliminated documentation silos by linking data assets directly to existing specs, analyses, ETL jobs, and dashboards without requiring content migration
- Reduced dependency on institutional knowledge by centralizing metadata and context within a single, searchable system
Start your own success story with DataHub
Meet with us
See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.
Join our open source community
Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.