Unlocking the Future of AI and Data: DataHub’s $35M Series B Journey

By:

Swaroop Jagadish and Shirshanka Das

May 20, 2025

Our Series B milestone: Fueling the next chapter of DataHub

Picture this: You’re about to launch a major new data-driven AI-powered product feature. But instead of moving fast, your team is stuck.

“What are the trusted data sources we should be using?”

“How do we ensure that the data will be reliable and support production use-cases?”

“How do we know that we are using this data in a compliant manner?”

We’ve been there. We’ve watched high-performing teams grind to a halt—not for lack of talent or ambition, but because they couldn’t see through the tangled web of relationships between data, people, and code.

That’s why we created DataHub.

What began as a small open source project at LinkedIn has grown into the leading open source metadata platform now used by over 3,000 organizations around the world. Today, DataHub powers the central nervous system of the modern data stack, helping enterprises unlock the trust, visibility, and context they need to move at the speed of AI.

Today, we’re thrilled to share a new milestone in that journey: a $35 million Series B raise led by Bessemer Venture Partners, with participation from 8VC, Tru Arrow, SineWave, In-Q-Tel, and Zero Prime. We’re also excited to welcome Lauri Moore of Bessemer Venture Partners to our board.

For all the details on our Series B raise, check out the official announcement >

To our customers, our community, and our team: thank you. You made this possible. Your belief in our mission is what brought us here.

And we’re just getting started. We’re committed to helping every organization build with context—so data and AI efforts don’t just launch—they last.

AI adoption is exploding—but missing context is holding enterprises back

It’s 2025, and AI isn’t just a boardroom buzzword anymore—it’s a business imperative. However, while enterprise investments in AI are skyrocketing, results remain frustratingly out of reach.

The missing piece? context.

The word context is everywhere you look. “Model Context Protocol,” “context windows,” etc.,… the list goes on. But what does context really mean?

Context in AI refers to the comprehensive set of relevant information, relationships, and situational awareness that gives meaning to data. It’s the critical background information humans naturally possess, but it must be explicitly provided to AI systems. Context transforms raw data into actionable intelligence by describing its characteristics and properties, enabling more accurate interpretations and predictions. Without context, AI operates in a vacuum, making decisions based on incomplete information and missing the nuanced understanding that drives truly valuable insights.

What are the kinds of context we find around us?

Technical context

Technical context encompasses the metadata and specifications that define how data should be interpreted and processed. This includes:

Data lineage: Provenance information about where data originated, how it was collected, and all transformations it underwent
Schema definitions: Structure, relationships, and constraints of the data
Quality metrics: Accuracy, completeness, freshness, and reliability indicators of datasets
Version control: Tracked iteration of data or models in use
Technical dependencies: Required systems, services, or resources for data processing

Technical context helps AI systems understand the “how” of data, ensuring they interpret information correctly and apply appropriate processing techniques.

Operational context

Operational context captures the dynamic metadata about how data is processed, accessed, and maintained within systems. This includes:

Runtime metrics: Pipeline execution status (success/failure), duration, resource consumption, and error logs
Access patterns: Authentication events, authorization decisions (accept/deny), query patterns, and usage frequency
Data SLAs: Freshness guarantees, update frequencies, and time-to-serve requirements
System dependencies: Inter-service relationships, API dependencies, and infrastructure requirements
Operational policies: Retention rules, archival schedules, and data lifecycle management constraints

Operational context enables AI systems to understand data assets’ reliability, performance characteristics, and usage patterns. This helps them make informed decisions about which data sources to trust and how to interpret temporal inconsistencies across systems.

Business and social context captures the human knowledge and governance frameworks that give data its organizational meaning. This includes:

Collaborative discussions: Critical insights in Slack channels, Teams threads, and emails where experts explain anomalies and resolve edge cases
Unstructured documentation: Tribal knowledge in wikis and docs explaining why metrics exist and how to interpret them correctly
Business glossary: Standard definitions linking technical assets to business concepts, KPIs, and domain terminology
Ownership structure: Data stewards, team responsibilities, and who to contact for questions
Access control policies: Rules defining who can view, edit, or use specific data based on roles, sensitivity, and business function
Compliance requirements: Regulatory frameworks and industry standards governing how data must be handled, protected, and retained

This layer ensures AI systems understand both the informal knowledge and formal governance requirements that shape how data should be interpreted and used within the organization.

The business cost of context-blind AI

As an industry, we’ve spent countless dollars ensuring humans have access to context. Only now are we beginning to address this critical gap in AI systems. Without proper context, all AI becomes expensive guesswork with real business consequences:

AI digital twins fail at knowledge work when they can’t access the tribal knowledge locked in human minds. They struggle with tasks that seem simple to experienced employees but require deep institutional understanding. This transforms potential productivity tools into disappointing, shallow replicas that require constant human supervision.
AI development tools like Cursor and Windsurf introduce technical debt by making code changes without understanding their downstream impact. When these tools lack visibility into system dependencies and operational workflows, they create unexpected failures in critical business applications, often at the worst possible moments.
Agentic AI applications make costly errors when faced with dynamic situations that require flexible thinking. Without contextual awareness, they can’t recognize when to adjust their approach or incorporate new information sources, leading to confident but wrong recommendations that amplify rather than reduce human workload.
Governance and Security teams become innovation blockers without guaranteed visibility and control. When AI lacks explainable context, these critical stakeholders have no choice but to restrict AI deployment, creating organizational friction that slows adoption and limits potential benefits.

The data tells a sobering story. A 2024 Harvard Business Review survey found:

Nearly 50% of organizations cite data challenges as the #1 barrier to AI adoption
91% of executives agree that a reliable data foundation is essential for AI success

And yet the context gap continues to widen. While 60% of organizations named AI as a key driver of their data strategy, only 12% report having data that’s actually ready to support it.

The result? Over 80% of AI projects fail—twice the failure rate of traditional IT projects.

This isn’t just a technical challenge. It’s a business crisis. One that drains millions in wasted investments and leaves competitive advantages on the table.

Why context demands a new platform era

Traditional data catalogs solved yesterday’s problem: helping humans find data. In today’s AI-powered world, this approach falls critically short. Both humans and machines now require context that is rich, real-time, and reliable—delivered at unprecedented scale.

This fundamental shift is transforming the market. What began as a $1B data catalog space is rapidly evolving into a $9B Context Management Platform category, with DataHub at the forefront.

DataHub addresses the context challenges that derail AI initiatives:

Technical context at scale: Our event-driven architecture captures billions of metadata changes across your entire data ecosystem in real-time, preventing AI systems from working with outdated information.
Operational context in one place: Unified discovery, lineage, and observability eliminate context switching, giving both humans and AI a complete view of how data flows through business processes.
Institutional knowledge capture: Flexible metadata modeling preserves tribal knowledge that would otherwise remain locked in human minds, crucial for AI digital twins and agents.
Governance with visibility: Enterprise-grade controls give security teams the transparency needed to enable rather than block AI innovation.
Enterprise-readiness: DataHub Cloud delivers the uptime, security, and white-glove support global enterprises need.

From Apple and Chime to emerging disruptors, organizations are using DataHub to bring trusted, contextually-rich data to the core of their AI strategies—turning potential AI failures into tangible business outcomes.

How we got here

We experienced many of these problems firsthand and solved our own pain as data leaders at LinkedIn and Airbnb. When DataHub was open-sourced in 2019, it was clear that the need was bigger than we had imagined and was not limited to just Silicon Valley companies.

Over the last four years, we’re proud to have hit several milestones in our growth:

Open source and company formation: We had already built what we wished we had—a metadata platform with the scale, flexibility, and granular lineage tracking required by modern data teams. Then, we open-sourced it to share with the world. In 2021, we formed DataHub (f.k.a Acryl Data) with investment from LinkedIn to bring a commercial offering to the market.
DataHub community growth: Since 2021, we’ve invested heavily in the community. What started as a small open-source project quickly resonated with data teams around the world. This early momentum laid the foundation for the vibrant community we have today—13,000+ members strong, with 3,000+ organizations like Netflix, Visa, Optum and Apple now using DataHub in production. Their input has consistently shaped our roadmap and validated our most fundamental design choices.
DataHub Cloud customer adoption: We launched DataHub Cloud to meet the demands of enterprise deployments with the security, reliability, and support that leading organizations expect, all while continuing to invest in the open source core. DataHub Cloud is now adopted by fast-growing organizations and Fortune 10 companies who take advantage of the accelerators around Discovery, Observability, and Governance for humans and machines.

How we’re investing in the future of AI and data

With this Series B raise, we’re accelerating our mission to bring clarity, trust, and context to the heart of AI. Our focus over the next chapter centers on four strategic priorities:

Strengthening our community: We’re deepening our investment in the thousands of users and contributors who power our open source ecosystem, ensuring the collective intelligence of our community continues to enhance context management for AI applications.
Advancing contextual intelligence: Our product and engineering teams are developing next-generation capabilities for AI governance and metadata intelligence that will help bridge the gap between raw data and the rich context AI systems require to make reliable decisions.
Expanding enterprise adoption: As more organizations face the context challenges of AI at scale, we’re scaling our go-to-market team to help companies across industries build the trusted data foundation necessary for responsible AI deployment.
Enhancing enterprise readiness: We’re evolving our enterprise platform to handle the performance demands of AI metadata workloads while providing the governance controls security teams need to confidently enable AI innovation.

Above all, we remain focused on what matters most: making context-rich, well-governed data the foundation of every successful AI initiative.

As organizations transition from AI experimentation to enterprise-wide adoption, DataHub will be the platform that transforms AI and data chaos into contextual clarity—unlocking the full potential of AI while minimizing the risks.

Join us on our journey

To everyone who’s helped us get here: thank you. Your belief, contributions, and partnership have made this journey possible.

But we’re just getting started.

If you believe in a future where context-rich, trustworthy data powers every AI initiative, we’d love for you to be part of it:

Join our open source community

Explore the project, contribute code or ideas, and connect with 13,000+ data practitioners in our Slack community.

Join our team

We’re hiring! Check out our open roles and help us build the infrastructure that will power the next generation of data and AI.

See DataHub Cloud in action

Interested in bringing DataHub Cloud to your enterprise? Book a custom demo to see how we can help power your data and AI initiatives.

Let’s unlock the full potential of AI—together.

Unlocking the Future of AI and Data: DataHub’s $35M Series B Journey

Our Series B milestone: Fueling the next chapter of DataHub

AI adoption is exploding—but missing context is holding enterprises back

Technical context

Operational context

The business cost of context-blind AI

Why context demands a new platform era

How we got here

How we’re investing in the future of AI and data

Join us on our journey

Join our open source community

Join our team

See DataHub Cloud in action

Recommended Next Reads

Unlocking the Future of AI and Data: DataHub’s $35M Series B Journey

Introducing DataHub Cloud v0.3.11

PRODUCT

Community

Resources

Company