• Simplify data discovery 
  • Enable a self-serve user experience
  • Faster impact assessment
  • Streamline PII data identification and tracking
  • Identify and remove dead pipelines and unused data assets
  • Enable AI-powered analytics platform using DataHub metadata as context

The Topline

Challenge
Overwhelming data management complexity with hundreds of terabytes monthly and thousands of datasets, resulting in constant questions to data teams and inefficient data discovery

Solution
Implemented DataHub as a centralized data catalog with end-to-end lineage tracking, business glossary, domain organization, and custom integrations

Impact
Streamlined data discovery, faster impact assessment during pipeline failures, quicker team onboarding, enhanced PII data identification, and enabled AI initiatives

Note: This story was originally published October 2024.

Challenge

Deutsche Telekom Digital Labs was drowning in data complexity. The company was managing hundreds of terabytes of data monthly across thousands of datasets, serving 11 countries with diverse regulatory requirements. 

The organization faced an endless stream of questions directed to their Data Team, including fundamental queries about data acquisition, content, access, ownership, utilization, and dashboard availability. This created bottlenecks and inefficiencies across multiple organizational roles.

Despite investments in building “the best data architecture” and tooling, the team realized something critical was missing.

“We realized that there was something missing because the speed at which we were generating the output was not enough for the business. 

— Shishidhar Singhal, Director of Engineering, Deutsche Telekom Digital Labs

Solution

After evaluating multiple metadata catalog solutions, Deutsche Telekom Digital Labs selected DataHub for its easy-to-use UI with quick search, easy onboarding, automatic metadata scanning, diverse capabilities like end-to-end data lineage, strong community support, and open-source availability.

Their implementation approach included:

Technical deployment:

  • Used Helm Chart for efficient deployment and management
  • Established metadata ingestion connections from seven core systems: Nifi, Athena, Redshift, Tableau, Redash, Kafka, and Opensearch
  • Completed metadata for data assets across the organization

Organizational structure:

  • Defined a comprehensive business glossary with standard business terms linked to data assets
  • Implemented Domains in DataHub to organize data assets into logical collections aligned with business units
  • Established end-to-end lineage tracking across all systems

From the fundamental or different task-based features which DataHub brings in, we are able to actually convert those features into some of the practical outcomes which we are able to achieve both on the technical and on the business ground.

SHISHIDHAR SINGHAL

Director of Engineering, Deutsche Telekom Digital Labs

Impact

With DataHub, the Deutsche Telekom Digital Labs team has seen significant improvements across multiple dimensions of their organization.

Key outcomes included:

  • Eliminated data discovery bottlenecks that generated 4-5 daily calls to data engineers
  • Reduced onboarding time from weeks to days for new team members
  • Accelerated incident resolution from days to minutes
  • Enabled GenAI analytics platform using DataHub metadata as LLM context
  • Automatic PII detection flags fields where PII information is present
  • Faster impact resolution with repository links for data pipelines
  • Cost savings from identifying and removing dead pipelines and transformations

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.