INDUSTRY
SIZE
DATA STACK
SOLUTION
USE CASE
GOALS
- Cost savings through the identification and elimination of redundant datasets
- Central data governance on top of a decentralized data mesh to support post-acquisition data sprawl
- Accelerated incident management and faster root cause analysis
- Cross-team data discovery and collaboration
The Topline
Challenge
Rising Snowflake costs due to redundant data, poor visibility into data usage, and a lack of governance across their complex, acquisition-heavy organization
Solution
Implemented DataHub Cloud with Metadata Tests, Impact Analysis, and lineage capabilities to monitor data usage and safely retire redundant datasets
Impact
Achieved 25% monthly data warehousing cost savings while establishing centralized governance for their decentralized data mesh
Note: This story was originally published September 2023.
Challenge
DPG Media, one of the largest media companies in the Benelux region, operates newspapers, television shows, and digital properties across Belgium, the Netherlands, and Denmark. With a strong acquisition strategy that brings new companies into the fold every one to two years, DPG faced mounting challenges with its data landscape.
As Mathias Lavaert explained, “What comes with acquisitions is getting all kinds of new data into our organization, getting new systems in our organization, and dealing with this overlap and chaos.”
The organization operated with a broad technology stack including business intelligence tools like Tableau, Qlik, and Looker, data warehouses like Snowflake and Redshift, various AWS accounts, and Kafka for streaming behavioral data. With acquisitions being common, their data stack continued to grow unpredictably.
The lack of governance created operational inefficiencies, particularly around Snowflake costs. Teams operated in silos without visibility into data usage patterns, leading to redundant datasets and wasted storage. Documentation was scattered across Confluence with infrequent updates, making it difficult to identify unused or duplicate data assets.
“We tried to operate under a large, central team, which wasn’t really manageable. In practice, we had already operated much like a data mesh, but without any form of governance.”
— Mathias Lavaert, Principal Data Engineer, DPG Media
Solution
DPG Media evaluated eight different data catalog solutions before selecting DataHub Cloud, which Lavaert describes as “just the perfect fit for our situation.” DPG also valued DataHub’s dual approach, supporting both technical users through APIs and non-technical users through an intuitive UI.
The implementation followed a structured four-step approach:
- Setup and integration: DPG integrated DataHub Cloud with their Snowflake data warehouse and Looker dashboards, enabling comprehensive analysis of data usage, lineage, and cost signals.
- Implementation of Metadata Tests: The team used DataHub’s Metadata Tests to continuously monitor data usage and flag datasets with low usage but high costs, determined through proxy signals like storage footprint.
- Elimination of redundant data: Using Metadata Tests in conjunction with lineage and impact analysis features, DPG could confidently retire datasets rapidly without business disruption.
- Continuous monitoring and optimization: Metadata Tests enabled ongoing monitoring of rules on top of DataHub’s metadata graph, ensuring data cleanup efforts weren’t limited to one-time occurrences and proactively identifying future inefficiencies.
“We evaluated eight different catalogs, and DataHub Cloud was just the perfect fit for our situation.”
— Mathias Lavaert, Principal Data Engineer, DPG Media
Impact
With DataHub, DPG Media experienced immediate cost savings and broad operational improvements across its data operations.
Key outcomes included:
- Reduced data warehousing costs by 25% through the identification and elimination of unused or duplicate data
- Streamlined incident management, enabling teams to quickly identify issues, contact responsible parties, and resolve problems faster
- Improved data collaboration across technical and business users with a unified platform
- Centralized data governance on top of a decentralized data mesh
- Established continuous cost optimization through ongoing monitoring
With DataHub, we were able to reduce our Snowflake costs by 25% each month. We used DataHub’s Metadata Tests to identify unused or duplicate Snowflake tables across business units. Impact Analysis allowed us to safely manage the cleanup process. Our cost savings are just the beginning; we still have a long way to go.
MATHIAS LAVAERT
Principal Data Engineer, DPG Media
Start your own success story with DataHub
Meet with us
See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.
Join our open source community
Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.