• Metadata cataloging for all data ingestion processes
  • Data lineage tracking across transformation pipelines
  • Data quality management through custom and dbt assertions
  • Tag-based access control integration with Apache Ranger
  • Business context enrichment for self-service analytics

The Topline

Challenge
Data centralization showed scalability limitations at 30+ petabytes

Solution
Implemented DataHub as the governance foundation for a decentralized data mesh architecture

Impact
Enabled domain-driven data ownership while maintaining enterprise-wide discoverability, quality, and security

Note: This story was originally published December 2023.

Challenge

By 2018, Airtel had moved from siloed data to a centralized data lake. But as data volumes ballooned past 30 petabytes and daily job counts topped 10,000, its centralized model started to crack. Innovation slowed. Governance became harder to enforce.

“Over time, we realized that centralization comes at a cost. Because of which we’ve moved towards a data mesh decentralized architecture.”

 — Vivek Bijlwan, Principal Product Manager, Airtel

Airtel needed a new foundation. One that could scale with the business while preserving strong governance and discoverability.

Solution

To enable their shift from a centralized data lake to a decentralized data mesh, Airtel implemented DataHub as the metadata management backbone across their organization. Rather than replacing their existing system outright, Airtel strategically positioned DataHub to operationalize data mesh principles at enterprise scale, placing governance and discoverability at the core.

Airtel made DataHub metadata cataloging a mandatory prerequisite for all data ingestion. This approach means every dataset must be properly documented and cataloged before entering the data lake, creating governance by design rather than as an afterthought.

Operationalizing “data as a product” at scale

Airtel leveraged DataHub to operationalize all six pillars of their “data as a product” model:

1. Ingestion is addressable

  • The start of the ingestion journey is dependent on source systems to push the metadata into DataHub through recipe files. This technical metadata is then referenced in each ingestion
  • Inputting business descriptions is a prerequisite for data to be ingested

2. Transformation is discoverable

  • All transformations happen through dbt. The lineage from dbt is automatically emitted out into DataHub via connectors

3. Data quality management is trustworthy

  • Assertions specified by users during ingestion and transformation are surfaced in DataHub

4. Data is self-describing

  • Data logic in dbt is surfaced in DataHub
  • Data stewards are responsible for describing their data products to maintain understandability and give users a direct line of communication for questions related to data products

5. Access controls are secure

  • Tags against tables and columns are used to secure the data using Apache Ranger’s tag-based policies, leveraging the DataHub Actions Framework

6. Data is usable

  • Business context from DataHub powers Airtel’s tool for business users to extract relevant data to drive insights
  • This layered approach allowed Airtel to scale decentralized data ownership while maintaining centralized visibility and governance: the foundation of a successful data mesh

DataHub plays a very critical role to be the bedrock for providing the requisite governance … through its metadata management.

VIVEK BIJLWAN

Principal Product Manager, Airtel

Impact

With DataHub, Airtel successfully made the leap to data mesh, enabling domain teams to own and manage their data while maintaining unified governance, quality, and discoverability.

Key outcomes included:

  • Transformed data architecture from centralized bottlenecks to decentralized data mesh serving 30+ petabytes and 10,000+ daily jobs
  • Improved data discovery by enforcing cataloging at the source
  • Streamlined quality checks with custom, scalable assertion logic
  • Secured access control at scale using tag-based policy enforcement

“DataHub plays a very pivotal role for us to make data usable.”

— Vivek Bijlwan, Principal Product Manager, Airtel

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.