Humans of DataHub: Roko Gudic

Humans of DataHub

Humans of DataHub

This week we are joined by Roko Gudic, Data Science Manager at Infobip. Infobip is a global leader in omnichannel communication, and its their business to simplify how brands connect with, engage, and delight their customers at global scale.

Roko shares his journey to DataHub, what DataHub has enabled within Infobip, his favorite features, and more. You don’t want to miss this conversation!

Humans of DataHub Interview with Roko

Conversation Transcript & Highlights

Edited for brevity & clarity

Maggie Hays: Welcome to another round of Humans of DataHub. I’m Maggie, the Community Product Manager for DataHub. And today, we are joined by Roko. Roko, thanks for joining us; please tell us a little about yourself, who you are, where you work, and what you’re doing.

Roko Gudic: It’s an absolute pleasure to be here. And I’m glad to even get familiar with DataHub as a tool in the Community. So kudos for that. My name is Roko Gudic, and I work for Infobip. We are the first Croatian unicorn. We are a global communication platform, basically in a B2B market, providing our customers with a way to contact their customers on a number of channels, starting from SMS, Viber, WhatsApp, or whatever they choose. We are currently ranked as a leader in the seekers or contact center as a service space by Juniper Research.

At Infobip, I am a Data Science Manager leading a team of data scientists. So with all this data we have as a company, our task is to get value out of the data. As you can imagine, not a single company is perfect. We have dealt with the good and the bad side of having huge volumes of data, and that’s actually how I got into data governance as an initiative in my company.

I’m also part of the Data Governance Committee, a multi-functional group of data governance enthusiasts. This leads me to DataHub and how we actually got to DataHub; We were honestly first faced [with the challenge] when we were thinking of proprietary tools, looking at Informatica and solutions like that. At that point in time, we wanted to see what was on the other side, so what was being offered in the open-source community. And, of course, DataHub was one of the first tools I stumbled upon among others, such as Amundsen, and we spent a month validating different approaches. We opted for DataHub because it offered what we needed in the sense of flexibility, the Community around it, the quality of the solution, user experience, and so on. That’s how we got here.

Elizabeth Cohen: What do you enjoy most about the DataHub Community?

Roko Gudic: For starters, I mean, from the first moment I joined slack, everyone was really, really welcoming. The first message, I felt like, Okay, go there, say hi. I thought, Okay, I’ll just say hi, no one will respond. And Maggie, in just five minutes, responded. Someone is giving this thought and taking care of new arrivals. And from that point onward, I just felt like, I have people around who are willing to help. It’s vibrant and welcoming. And, the quality of the solution of the product gathered people who either care for high-quality products or are in need for a high-quality product. I believe that [DataHub] is like a perfect storm, so to speak; like-minded people needing the same solutions and helping each other out on this journey. It’s really wonderful.

Maggie Hays: That’s great. I’m glad that you felt welcomed. I also know you go above and beyond to welcome other people and provide support. So we’re super appreciative of that. And I know, you know, we can’t do all of this alone. And especially as the Community grows, it’s just so wonderful to have folks like you willing to jump in, help people feel supported, help give direction, help brainstorm, so we appreciate you just as much.

Roko Gudic: Thank you. That’s precisely why I kind of felt, and it felt natural to help other people as well. Because I think the same, the same energy coming towards me. So I just wanted to return it. And that’s the main idea behind the Community and open-source projects.

Maggie Hays: We’re doing something special here. I’m excited about it. So thinking about DataHub within your organization, with you being in data science and more data governance-focus, what are you looking for DataHub to enable within your organization? Or what problems have you already addressed or looking to address by incorporating the tool?

Roko Gudic: The most obvious response would be pure data discoverability. In a company like Infobip, we are reaching around 7 billion mobile phones every month, we are working on a huge scale. And that brings its own set of problems. So just from a pure data discoverability standpoint, we have already gained much. But the most important thing is having a platform to make business people and IT people work together better. In a sense that I believe that business glossary and everything around, you know, standardizing the naming standard definitions that need to come from the business side. And to map the physical representations of those concepts together for people to, for example, when they come to our company, they have one central place where they can go and search for concepts that they are learning about, and see how those data elements that are connected to the concepts are being used throughout the company, in dashboards and stuff like that. So it eases the onboarding, but also for people that are already present in the company for so long to be able to speak with the same language about that we think, mean the same thing.

Maggie Hays: Sure. That’s awesome.

Elizabeth Cohen: With DataHub, what has been your favorite feature or use case?

Roko Gudic: Okay, so for me, it would probably be column-level lineage. That was something that I anticipated, the rest of the Community and me. That’s something that had the biggest, I’d say, jump in value, so to speak. Also, as I mentioned, the business glossary and just, you know, the quality of the search functionality as well. We have, like, hundreds of thousands of entities in DataHub, so quality search mechanism is necessary. And we have that in DataHub. So yeah, but I would say column-level lineage number one.

Maggie Hays: Yeah, column-level lineage is one where we’ve had so much excitement. And I think one of the things I’m really excited to figure out as a Community is how, you know, when you’re in the scale of hundreds and thousands of datasets, and then you amplify that with column-level lineage, it can become exceptionally complex very quickly, right? And so, I think we have a really great Community around people wanting to solve this problem. How do we solve column-level lineage in an impactful way, and not just have like a pretty visualization to go along with it, but actually, empower people to act on it? So super excited about it.

Roko Gudic: That’s what brings me to another great, great aspect of DataHub that we just started to explore; DataHub Actions, and writing out workflows either for creating new glossary terms and having like a lifecycle for a term; so someone proposes it, then it’s assigned to someone who would need to approve it, and then have it approved and published for the rest of the company. Also, we are looking into data collection in the sense of impact analysis, of course, column level. And that’s something that we are kind of really, really enthusiastic about. That’s where the biggest value will happen, actually.

Maggie Hays: Keep us posted on that, because that’ll be a fun story to tell the Community because I think, you know, there’s a lot that we can learn from how you approach it. And that’s really exciting.

Maggie Hays: With your current implementation of DataHub, and sounds like you’re going to be working on actions, sounds like you’re trying to automate some of those workflows… What are you excited to see from the DataHub Community beyond those features, additional features or ways to collaborate within the DataHub Community?

Roko Gudic: For us, it’s the new [ingestion] sources. We are looking to contribute the Qlik data source. We are ironing out some details and I cannot wait to bring it back to the Community. Also, I want to see more ways to engage business stakeholders. And how I would approach it, or one of the ways I would approach it, is to suggest, and I already have plans for this to communicate to the Community, to include our concept of a business process. Because a lot of you know, data entities and data sources are being consumed in some higher-level business processes, those business processes have their owners, right? And it’s a good way to connect, or to bring this whole data perspective to the non-technical business user. The way I see it, the biggest value for the company actually happens when business and IT or data people start working together to improve the quality, to improve the whole governance. I believe business processes as a concept is something that could potentially, tie things up a little bit more.

Maggie Hays: That’s really cool. So kind of like attaching the underlying data to the day-to-day, kind of human workflows or decisions that are being made, that maybe aren’t tied directly to a dashboard or to a model. But tied to more of a concept. That’s really awesome. We’re hearing or I’m hearing more and more like rumblings that are interested in that. So maybe that’s something that we can put out a Community RFC, right, like, how do we model a business process effectively within DataHub? That’s really cool.

Roko Gudic: Yep, it could be similar to how data flows and jobs. Some dissimilarities exist [that would require] additional development, but we could take a similar approach with modeling business standards and business processes.

Elizabeth Cohen: Roko, we have one last question for you. What advice would you give someone curious about the DataHub Community or just joining the DataHub Community? What advice would you give them as they’re just starting?

Roko Gudic: I would say, you know, jump in 100%. Immerse yourself in the Community, try the tool, talk to people, ask questions, and hopefully also help other people. Really, just immerse yourself in the Community and experience.

Maggie Hays: We are a great bunch if I daresay so myself.

Roko Gudic: I would agree.

Maggie Hays: Well, Roko, thank you so much for taking the time with us. We are very appreciative of the contributions you’re giving back, and you’re helping build this Community with us.


If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (howdy, friends! 🤠), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 5.5k members (and growing!) and 315+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to learn more about DataHub and how to join our community? Visit https://datahubproject.io and say hello on Slack. 👋

Similar Posts