How to Make Data Governance Work in the AI Age

Earlier this summer, Swaroop Jagadish, Co-Founder and CEO of DataHub, joined The Ravit Show live from Snowflake 2024 Summit to share his thoughts on the challenges and opportunities of managing data governance in a rapidly changing AI environment.

Watch the full interview below and read on for Swaroop’s top insights on how Acryl and DataHub are helping companies navigate and stay ahead in this space!

Connect with the DataHub Community!

Join us on Slack 

Ravit: Hi Swaroop, could you please share more about your data journey and your role at Acryl Data?

Swaroop: Absolutely. I’m the CEO and co-founder of Acryl, which we started in early 2021. Before that, I led the data platform team at Airbnb for about seven years, working on search data, cloud cost efficiency, and various use cases related to productivity, compliance, and quality — all driven by metadata.

My co-founder, Shirshanka Das, created DataHub, now the #1 open source data catalog and metadata platform. Additionally, we have a cloud product that builds on top of it, providing even more capabilities.

Ravit: What are our quick thoughts on the rapidly evolving data and AI landscape?

Swaroop: Clearly, there’s a lot of interest and demand, which is great for all of us. But there’s also a lot of noise. Customers sometimes get overwhelmed by the multitude of choices. Legacy vendors often slap an AI label on their products and claim they’re ready for the AI world. We’ve seen platforms like Apache Atlas, originally a Hadoop platform, being used by vendors who add some additional capabilities and say they’re AI-ready. This creates a lot of noise.

In our experience, there are some fundamental capabilities required at the platform level that can’t just be tacked on. These principles need to be built from the ground up, like versioning, time-oriented metadata, and proving that things are happening as they should in AI use cases. Despite all the noise, I believe first principles will win out in the long term. I’ve seen that happen repeatedly.

Ravit: We’ve also seen a trend toward consolidation and unification in the data stack. How important do you think it is for organizations to adopt a platform with a unified approach?

Swaroop: This is an age-old debate: whether to unify or specialize. At Airbnb, we constantly faced this. You develop specialized solutions as needed, but over time, you peel back and unify. Some things make sense to specialize, while others make sense to unify. For example, logically organizing your data or managing policies at scale shouldn’t be splintered across multiple tools. A classic example is ownership; you don’t want five different ways of defining ownership in different tools.

Customers are tired of special-purpose tools for closely related use cases. The time has come for simplicity. People care more about value now and the simplicity of implementation. That’s why we have a unified platform that combines governance, observability, and discovery into one solution.

Ravit: Switching gears a bit. What are the advantages of building open-source standards and collaborating with a large community of practitioners?

Swaroop: The space is evolving so fast. You want to collaborate with the best practitioners. We have companies like Netflix and Visa working with us on the DataHub project. DataHub is evolving quickly and is now being embraced for AI use cases. Building in the open, on open standards, helps future-proof your investments and benefits from the insights of over ten thousand active data practitioners. I’m a firm believer in this model, and I have a bias for it.

Ravit: I’d like to hear a bit about your customers. Has there been an instance where customers surprised you?

Swaroop: I’ve been surprised by how quickly customers are moving beyond traditional BI use cases into AI use cases. It’s no longer just prototypes; they’re going to production with these use cases. They’re demanding that the underlying platform for managing data, governance, and quality doesn’t involve yet another toolchain. So, the pace of innovation and the shift to AI use cases among our customers has been surprising. DataHub is evolving quickly to become the control plane for both data and AI use cases.

Ravit: Totally. How do you feel about the rest of 2024 and early 2025? How will this phase evolve?

Swaroop: If you look at what’s been happening, there’s been a lot of emphasis on getting large language models to deal with multimodal data. I see the next wave as getting more enterprise applications built on top of enterprise data. Not just casual chatbots but actual end-to-end enterprise applications. You will see full workflows being implemented. Of course, humans will be important, but that’s what I see going to production in the next eighteen months.

Ravit: I like how you said humans are going to be very important because there’s always a debate about that.

Swaroop: Yes. There will always be a demand for specialized talent. It will just be about being much more efficient.

Watch the full episode here!

Recommended Next Reads