• Power decentralized discovery via enriched data products
  • Extend metadata model with business-specific context
  • Ingest metadata from internal systems via custom plugins
  • Automate access and workflows using real-time metadata events

DataHub has provided us an excellent base from which to build on data mesh, and leverage much of its functionality out of the box, such as its ingestion framework, search capabilities, and built-in metadata aspects.

MIKE SCHLOSSER

Lead Software Engineer, Optum


The Topline

Challenge
Centralized data platforms created bottlenecks in access, governance, and schema standardization across business units

Solution
Built “Data Forge,” a decentralized data platform powered by DataHub, enabling secure, self-serve access to trusted data products

Impact
Faster time to data for consumers, real-time metadata triggers for automation, and improved data discoverability across teams

Note: This story was originally published March 2022.

Challenge

Optum operates at the leading edge of healthcare innovation, working with petabyte-scale data to power predictive analytics, patient care platforms, and longitudinal health records.

But the company’s centralized data model wasn’t keeping up. Rigid governance frameworks and one-size-fits-all schemas slowed data access and left engineers and analysts waiting.

Their initial healthcare data platform relied on three access patterns:

  1. Standard APIs
  2. Standardized data streams
  3. Common big data ecosystem using change data capture processes that read transaction logs and emitted events to Apache Kafka

While this approach proved successful for most use cases, it came with its own set of challenges:

  • Heavy reliance on standardized schemas that were difficult to identify and align across vast scope and diverse lines of business
  • Increasingly complex governance processes as the platform expanded
  • Teams frequently requiring access to source data rather than standardized versions due to varying use cases
  • Bottlenecks in data access, governance, and movement that hindered engineering team productivity

Solution

To operationalize its data mesh architecture, Optum built “Data Forge”, a company-wide platform built on top of DataHub. This gave teams the infrastructure they needed to produce, discover, and govern data products across lines of business without central bottlenecks.

Here’s how DataHub made that possible:

  • Extensibility: Optum needed to represent data products, pipelines, and ML models with rich business context. DataHub’s extensible schema allowed the team to add new top-level entities and custom metadata aspects without rewriting core components

“One of the main benefits we have leveraged from DataHub is its extensibility. We can easily use low code methods to extend the metadata model… This was a significant upgrade over many metadata platforms previously used.”

— Mike Schlosser, Lead Software Engineer, Optum

  • Custom ingestion pipelines: Many of Optum’s most critical datasets lived in internal systems not supported by traditional metadata platforms. With DataHub, the team was able to build plugin extensions that ingest metadata from internal platforms, bridging gaps and unifying visibility across the stack
  • Event-driven automation: DataHub’s real-time event architecture unlocked key operational workflows. For example:
    • When a schema changes, consumers are automatically notified
    • When a subscription is approved, access is provisioned without manual handoffs
    • When metadata changes, data movement jobs are triggered to keep pipelines up to date

Impact

With DataHub as the foundation, Optum transformed how teams discover, manage, and govern healthcare data. 

Key outcomes included:

  • Faster time-to-value for engineering and analytics teams through streamlined access review and provisioning processes
  • Enhanced data use with discoverability via data products with enriched metadata that provide necessary context around datasets
  • Real-time responsiveness to metadata changes enabling automated access provisioning, consumer notifications on schema changes, and triggered data movement workflows

Start your own success story with DataHub

Meet with us

See how DataHub Cloud can support enterprise needs and accelerate your journey toward context-rich, AI-ready data. Request a custom demo.

Join our open source community

Explore the project, contribute ideas, and connect with thousands of practitioners in the DataHub Slack community.