Customer Stories / Hurb

Hurb Arrives at Their Destination: A Single Source of Truth Across a Growing Data Stack


  • Unify data discovery across multiple platforms
  • Eliminate metadata inconsistencies
  • Improve data visibility with lineage tracking from source to destination
  • Streamline impact analysis

The Topline

Challenge
Struggled with data asset discovery, traceability, and metadata inconsistencies across their growing data platform

Solution
Implemented DataHub with custom Kubernetes deployment, Airflow orchestration, and integrations across their entire data stack

Impact
Established a single source of truth with end-to-end visibility and automated lineage tracking, improving discovery, quality, and governance

Note: This story was originally published February 2023.

Challenge

Hurb, a Brazilian online travel platform, built its business on a solid data-driven culture. 

However, rapid growth created significant data discovery and governance challenges:

  1. Fast-growing data assets: Data became difficult to manage as new technologies and integrations were added
  2. Resource cataloging and data discovery issues: Teams struggled to locate and understand the purpose of existing data assets across their growing infrastructure
  3. Traceability of data origin: Critical for strategic decision-making to understand how data was transformed and loaded
  4. Building a single source of truth: Cataloged assets separately in their primary services (Metabase and BigQuery) caused metadata inconsistencies.

“What’s the value of having many data assets if you cannot find them, or discover their purpose?”

— Patrick Braz, Data Engineer, Hurb

Solution

Hurb chose DataHub after creating comprehensive project requirements documentation and evaluating data catalog tools. Their decision was driven by four key factors:

  1. User-friendly interface supporting their self-service culture
  2. Active and receptive community for implementation support
  3. Contribution opportunity aligning with their open source culture
  4. Built-in ingestion sources for their primary services (Metabase and BigQuery)

Hurb deployed DataHub on Kubernetes using custom flattened charts and made the strategic decision to disable frontend ingestion, positioning DataHub as their single source of truth with all ingestion controlled through backend processes.

The implementation centers on Airflow orchestration, where all DataHub ingestion is managed through Kubernetes Pod Operator with a custom DAG factory. Hurb built a custom Airflow integration using dataset objects that allows data engineers to enrich metadata during DAG development and automatically build lineage. 

They integrated their data quality platform, Anomalo, and actively use DataHub’s impact analysis feature to identify who is affected by data changes or quality issues.

We want to use DataHub as a source of truth.

Patrick Braz

Data Engineer, Hurb

Impact

With DataHub, Hurb evolved from fragmented data management across multiple disconnected services to a unified metadata management platform that delivers end-to-end visibility.

Key outcomes include:

  • Established a single source of truth across a growing data stack
  • Eliminated metadata inconsistencies through centralized cataloging
  • End-to-end data visibility and quality control from source to destination
  • Automated lineage building with lineage backend
  • Proactive impact analysis capabilities to identify who is affected by data changes or quality issues

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.