• Enable self-service data discovery
  • Establish end-to-end data lineage across systems
  • Improve visibility into pipeline impact and dependencies 
  • Decentralize metadata stewardship to domain experts

The Topline

Challenge
Struggled for 6 years with disjointed data systems, complex metadata management, and inability to provide data lineage and discovery capabilities to stakeholders

Solution
Implemented DataHub using Docker deployment with standard recipes and custom metadata injection for internal tools

Impact
Achieved comprehensive metadata management with built-in search and lineage capabilities in just 3 days, generating team excitement and organizational buy-in

Note: This story was originally published July 2023.

Challenge

Slack’s data engineering team faced a persistent six-year challenge: building a unified metadata layer across a highly complex ecosystem. Despite having what Senior Data Engineer, Nedra Albrecht, described as a “very best practice typical warehouse construction” with AWS, Hive Metastore, Airflow, Presto, and Spark, crucial data context remained elusive.

“Where I think things get complicated is that there’s this data context layer on top of [the stack]… we don’t often see that. If we just get our system architecture right, it’s just gonna magically come together. But really, there’s a lot of complexity.”

— Nedra Albrecht, Senior Data Engineer, Slack

Key pain points included:

  • Data discovery crisis: Stakeholders struggled to locate or understand the right data
  • Lineage blind spots: The team needed visibility into end-to-end data relationships and dependencies across their multi-layered system
  • Lack of impact visibility: When pipelines failed, the team couldn’t easily identify downstream impacts, making it difficult to communicate with stakeholders and identify optimization opportunities
  • Metadata bottlenecks: The engineering team was overwhelmed as the default metadata stewards, despite domain experts being better suited to own context

Over the years, Slack trialed multiple solutions, from Airflow-based dashboards and internal tools to OpenSearch, Marquez, and ANTLR query parsers, each falling short.

“We’ve tried a lot. And we learned a lot of things,” Albrecht reflected. Each attempt taught them that “managing metadata is not only a complex process, it requires flexibility in thinking about it.”

Solution

Slack found its breakthrough with DataHub. What stood out immediately was DataHub’s support for both push and pull metadata workflows, eliminating the need to choose between them as previous attempts had required.

Other standout capabilities included:

  • Built-in support for search and lineage
  • Ready-made recipes for almost all of Slack’s stack
  • Support for managing human-generated metadata
  • Push/pull approach with the ability to inject additional data

“All I have to do is inject the data as we have it either through recipes that already exist or my own custom data that I’m injecting. We can set up governance policies to help ease the burden on data engineering, empower our users, really do important things for them.”

— Nedra Albrecht, Senior Data Engineer, Slack

The team deployed DataHub in just three days as part of a Hack Day initiative, using Docker containers and Slack’s internal orchestration tool, Bedrock. “It was really, really easy, actually,” emphasized Albrecht.

It was easy to deploy, and everything just worked… We were able to do in three days what we had trouble doing for those entire six years.

NEDRA ALBRECHT

Senior Data Engineer, Slack

Impact

Slack saw immediate, measurable results that transformed how metadata is managed across the organization.

Key outcomes included:

  • Accelerated metadata management implementation from 6 years of unsuccessful attempts to a fully functional proof of concept in just 3 days
  • Eliminated custom development overhead by leveraging built-in search and lineage capabilities instead of building from scratch
  • Accelerated use of data by enabling self-service data discovery, giving stakeholders the ability to independently find and trust the right datasets
  • Reduced data incidents and facilitated impact analysis with end-to-end lineage tracking, unlocking visibility across the entire data platform
  • Decentralized metadata ownership through governance policies, enabling domain experts to contribute directly

“This tool is so flexible. The data model itself is extensible, which I love. It’s based on a graph network, which is the way metadata should be represented. It is, to me, just mind-blowingly good.”

— Nedra Albrecht, Senior Data Engineer, Slack

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.