
Celebrating 5 Years of DataHub: Innovation, Impact, and Community
DataHub just turned 5! From its humble beginnings as an internal project at LinkedIn to becoming an industry-leading metadata platform, we’ve had an exciting journey.
In case you missed the celebration, here’s a recap of the key moments and milestones that made this birthday one to remember. You can also watch the full DataHub Turns 5 webinar.
From Idea to Impact — Reflecting on The DataHub Journey
The journey began over a decade ago — back in 2013 when Shirshanka Das, the founder of the DataHub project, took over as the architect of LinkedIn’s data platform team.
As LinkedIn transitioned from a data warehouse to a data lake strategy, the team built systems for ingestion, processing, serving, and visualization. However, this rapid evolution introduced a new challenge: understanding the data’s journey and ensuring its quality. The team needed to extract metadata from disparate tools to create a shared, universal context for data assets, their lineage, and quality.
“We were moving much faster, but at the same time moving without understanding or moving without quality,” recalls Shirshanka.

The first step toward solving this challenge was taken in 2015 when the Metadata Bus Vision was shared at Strata Singapore. This vision proposed a universal way to extract and integrate metadata across different stages of the data lifecycle.
A year later, the LinkedIn team open-sourced WhereHows, a metadata repository aimed at answering two key questions:
“Where is my data?” and “How did it get here?”
The Seeds of the DataHub Vision
Then followed the realization that metadata wasn’t a small problem — it needed to scale like data itself. This insight led to a new architecture emphasizing stream-first processing and analytics.
In 2017, Shirshanka presented a refined metadata vision at Crunch Conference, recognizing that metadata needed to be scalable, stream-first, and analytical — just like data architectures themselves.
Then came 2019 — the turning point. We took everything we had learned and open-sourced DataHub, an evolution of WhereHows, at the Crunch Conference.
“We realized that metadata needed to be treated as a product, not an afterthought. That’s when DataHub was born,” shares Shirshanka.
DataHub was built to be a modern metadata platform — scalable, real-time, and designed to solve discovery, governance, data quality, and lineage challenges.
How Far We’ve Come and What Brought Us This Far
In early 2020, we launched the Slack community, and before long, companies like Expedia and SpotHero started adopting DataHub in production. By 2021, Acryl Data was founded to commercialize DataHub and accelerate its growth.
“We’ve gone from solving LinkedIn’s metadata problems to solving metadata challenges for the world,” shares Shirshanka.
Today, DataHub has grown into a mature, industry-leading metadata platform that organizations use worldwide.
The DataHub journey has been nothing short of incredible:
- 12,400+ Slack members
- 10K+ GitHub stars, a testament to strong developer support
- 1K+ weekly active users
- 6,000+ messages per month
- Over 2 million lines of code contributed
- 3000+ companies, including Netflix, Pinterest, Miro, Deutsche Telekom, Coursera, Chime, and Notion, have made DataHub an integral part of their data infrastructure.

Our work is increasingly being recognized. We’ve been featured in prestigious reports by organizations like OneHouse, earning accolades for our contributions to data governance, compliance, and open source innovation.

We’ve prioritized creating spaces where data professionals can connect, share ideas, and explore the latest advancements in AI and data management.
- Acryl Data’s Metadata & AI Summit brought together over 30 industry leaders to discuss navigating data complexity, elevating data quality, and the landscape of open source AI.
- We spoke to data leaders in our series Decoding Data Leadership to get insights into the challenges, strategies, and lessons shared by those riding new technology waves and steering data teams through choppy waters.
And yet, as we look ahead, it feels like we’re just getting started.
What Lies Ahead: Possibilities and Promises
As we celebrate how far we’ve come, it’s time to look ahead at where we’re going next.
Data management is undergoing a fundamental shift — moving from human-centric workflows to systems that operate at machine speed.
“The old world was built around human-centric workflows — data scientists browsing catalogs, analysts checking lineage, compliance teams doing manual audits. But today, AI systems need to autonomously discover and interact with data in real time, almost like an air traffic control system for data,” shares Swaroop Jagadish, co-founder and CEO of Acryl Data.

DataHub’s architecture, shaped by our community, was designed for this shift from the very beginning. Its event-driven design, API-first backbone, and graph-first approach ensure that metadata management works seamlessly for both humans and machines.
What started as a metadata platform that focused on collecting and indexing technical metadata has now grown into a dynamic control plane — one that connects disparate systems and enables automation, governance, and real-time adjustments. We’re looking at a world where AI models automatically validate training datasets, pipelines adjust dynamically based on data quality signals, and governance is enforced at scale across thousands of AI models.
Beyond technology, our commitment to open source remains steadfast. As we expand, we’re investing heavily in developer tooling, dedicated community engineering resources, and richer technical content to support our growing user base.
Introducing DataHub 1.0 and Product Roadmap
This birthday also marks the release of DataHub 1.0, which comes with a host of exciting features and improvements designed to make managing and accessing your data smoother than ever.
We started this journey over a year ago with the main goal of making DataHub more accessible for everyone in the organization — not just the data experts.
This update includes a redesigned User Experience, AI and Data Management, Support for the Iceberg REST Catalog, and more.

We believe that only by constructing this graph with the utmost precision can we solve critical problems such as data discovery, data governance, and data observability in a seamlessly unified way.
Here’s our 2025 vision for these three solutions:
1. Data Discovery
We have three primary areas of focus for data discovery in 2025:
- Human-Centered Insights: Enriching discovery with human-generated metadata and context.
- Intelligent Exploration: Streamlining navigation with robust contextual insights.
- Enhanced Lineage Usability: Expanding end-to-end lineage and integrations with analytics tools, data lakes, and stores.
2. Data Governance
On the governance front, we are focused on:
- Universal Data Registry: Complete visibility into all data assets, models, and dashboards.
- Centralized Compliance: DataHub is the source of truth for tracking, ownership, and regulations.
- Policy Enforcement: Expanding DataHub’s Actions Framework for consistent metadata governance.
3. Data Observability
As part of our efforts to make observability more accessible and collaborative, our three key focus areas will be:
- Accessible Data Quality: Insights for all stakeholders, not just engineers.
- Collaborative Data Quality: Shared tracking and resolution of issues.
- Contextualized Data Quality: Enriching insights with business context from DataHub.
For a deep dive into what the future holds for DataHub, check out the DataHub 1.0 Roadmap.
The Foursquare Story: From Fragmentation to Control Plane
We’ve always had a vision for DataHub to be the ‘control plane for data’. But it truly hits home when a customer uses the same words to describe how they’re leveraging it.
Vikram Gundeti, CTO of Foursquare, joined us on this special day to share how they view DataHub as the control plane for their data ecosystem — and how it helped them take control of a fragmented data ecosystem.
“DataHub Cloud offers the ability to deliver a comprehensive developer experience and maximize the utility of our data assets,” shares Vikram.
Watch Vikram talk about FourSquare’s Fragmentation — to Control Plane journey, and check out Foursquare’s blog post to learn more about their journey and approach.
Onwards and Upwards: Thank You for Five Incredible Years
Before we close, we want to extend a huge thank you to our community, contributors, and users for making this journey possible. It’s been an incredible ride, and we couldn’t have done it without you.
Join Us Onward
- Contribute to DataHub: Get involved in the project on GitHub and help us shape its future. Watch our roadmap video to see how you can get involved.
- Join Our 12k+ Slack Community: We want to hear from you — your feedback has always guided our innovation, and your voice will help shape what’s next. Let’s connect and discuss ideas together.
- Get Certified: Stay tuned for our technical certification workshops launching this spring. Learn best practices for setting up DataHub and maximizing its potential.
- Check out our resources:
DataHub Turns 5 full on-demand webinar
Once again, thank you for being a part of our journey — here’s to many more years of ideas, innovation, and impact!