INDUSTRY
SIZE
DATA STACK
SOLUTION
USE CASE
GOALS
- Simplify data discovery
- Enable a self-serve user experience
- Faster impact assessment
- Streamline PII data identification and tracking
- Identify and remove dead pipelines and unused data assets
- Enable AI-powered analytics platform using DataHub metadata as context
The Topline
Challenge
Overwhelming data management complexity with hundreds of terabytes monthly and thousands of datasets, resulting in constant questions to data teams and inefficient data discovery
Solution
Implemented DataHub as a centralized data catalog with end-to-end lineage tracking, business glossary, domain organization, and custom integrations
Impact
Streamlined data discovery, faster impact assessment during pipeline failures, quicker team onboarding, enhanced PII data identification, and enabled AI initiatives
Note: This story was originally published October 2024.
Challenge
Deutsche Telekom Digital Labs was drowning in data complexity. The company was managing hundreds of terabytes of data monthly across thousands of datasets, serving 11 countries with diverse regulatory requirements.
The organization faced an endless stream of questions directed to their Data Team, including fundamental queries about data acquisition, content, access, ownership, utilization, and dashboard availability. This created bottlenecks and inefficiencies across multiple organizational roles.
Despite investments in building “the best data architecture” and tooling, the team realized something critical was missing.
“We realized that there was something missing because the speed at which we were generating the output was not enough for the business.“
— Shishidhar Singhal, Director of Engineering, Deutsche Telekom Digital Labs
Solution
After evaluating multiple metadata catalog solutions, Deutsche Telekom Digital Labs selected DataHub for its easy-to-use UI with quick search, easy onboarding, automatic metadata scanning, diverse capabilities like end-to-end data lineage, strong community support, and open-source availability.
Their implementation approach included:
Technical deployment:
- Used Helm Chart for efficient deployment and management
- Established metadata ingestion connections from seven core systems: Nifi, Athena, Redshift, Tableau, Redash, Kafka, and Opensearch
- Completed metadata for data assets across the organization
Organizational structure:
- Defined a comprehensive business glossary with standard business terms linked to data assets
- Implemented Domains in DataHub to organize data assets into logical collections aligned with business units
- Established end-to-end lineage tracking across all systems
From the fundamental or different task-based features which DataHub brings in, we are able to actually convert those features into some of the practical outcomes which we are able to achieve both on the technical and on the business ground.
SHISHIDHAR SINGHAL
Director of Engineering, Deutsche Telekom Digital Labs
Impact
With DataHub, the Deutsche Telekom Digital Labs team has seen significant improvements across multiple dimensions of their organization.
Key outcomes included:
- Eliminated data discovery bottlenecks that generated 4-5 daily calls to data engineers
- Reduced onboarding time from weeks to days for new team members
- Accelerated incident resolution from days to minutes
- Enabled GenAI analytics platform using DataHub metadata as LLM context
- Automatic PII detection flags fields where PII information is present
- Faster impact resolution with repository links for data pipelines
- Cost savings from identifying and removing dead pipelines and transformations
Start your own success story with DataHub
Meet with us
See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.
Join our open source community
Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.