Humans of DataHub: Vincent Koc

For this edition of Humans of DataHub, we had the pleasure of speaking withย Vincent Koc, Head of Data atย hipages, Australiaโs largest online trade marketplace, connecting homeowners or businesses with trusted tradespeople.
In this chat, Vincent shares his experience with DataHub, figuring out theย data governanceย journey, his favorite DataHub features, and advice for data folks. This conversation was another reminder of how amazing the DataHub community is and how much we have to learn from each other.
Humans of DataHub interview with Vincent Koc, closed captioning provided via YouTube
Conversation Transcript & Highlights
Edited for brevity & clarity
Maggie Hays:ย Welcome to the latest round of Humans of DataHub! Today, we are joined by Vincent, one of our community members. Vincent, please introduce yourself โtell us who you are, where you work and what you do.
Vincent Koc:ย Hi, Iโm Vincent Koc, based in Sydney, Australia. I work as the head of data for a digital native organization called Hipages. Hipages is Australiaโs largest trading marketplace. We effectively match traders โ or as we like to call them, tradespeople โ with consumers in the market. Being a digital native and marketplace organization, we have a huge and vast array of data assets. And thatโs how I stumbled across DataHub.
Maggie Hays: Awesome. How did you find DataHub? What was the reason for even starting to look for something like DataHub?
Vincent Koc: My predecessors were early adopters of DataHub as a technology. As an organization, weโre quite big on embracing open technology and open standards. And I believe DataHub has been in this space, which is quite critical when we talk about things like data contracts and data lineage and data governance. open technology that others can adopt. So yeah, my predecessors opted for DataHub, and as I started to explore its capabilities and what I could do with it, I realized you can really shape it for your organization and its data governance..
Maggie Hays: Yeah, definitely. Is there anything you enjoy most about the DataHub community?
Vincent Koc:ย The town halls, which unfortunately, I canโt join all the time, given the timezone difference. But I just love the energy in the announcements, and also just watching the announcements and the feature request channel. Seeing the product come to life, shaped by the community, and to be able to see the impact the community has on the roadmap and the development of DataHub is quite interesting to see.
Maggie Hays:ย Thatโs one of my favorite parts of my job. Itโs so much fun to be able to truly shape the roadmap around what the community is asking for, and collaborate with folks on really fleshing out those user stories or use cases.
You mentioned that you see DataHub as something malleable or something that you can shape to the needs of your work. What are some of the use cases or problems that DataHub is has catered to addressing within hipages?
Vincent Koc: Weโve gone through many iterations of our data architecture from warehouse to lake and now on to the lake house architecture. DataHub with its extendable use of data lineage and ability to call various BI tools and systems, has helped mostly with visibility into whatโs going on.
Every company dreams about this perfect world with data governance, but itโs imperfect. We have to somehow bring structure to it. So DataHub allows us to be a bit more calculated with the choices that we make when dealing with legacy architecture, reports, and systems.
Paul Logan: Thatโs super cool. Is there anything that youโre really excited to see happen with DataHub in 2023?
Vincent Koc: For me, itโs this space around sort of data contracts. You can define it, you can enforce it, but when you start to understand the impact of a data contract, and at least understand the enforcement, you need lineage in place. You need to understand what is downstream and upstream of adata asset.
What are the data products in your ecosystem? Whoโs using it? How often? Thatโs the only way we can actually work out if the contract has actually been violated or if weโre meeting SLAs.
I find that what weโre trying to centralize a lot of our metadata and somehow work it around lineage. I know DataHub has done a lot of work done around the domain model and different concepts of ownership, as well.
So Iโm super excited about just really going on that journey and that transformation in terms of data governance.
Maggie Hays:ย Weโre in the early stages of figuring out what a data contract means within DataHub. Feedback from folks like your team and your use cases is going to be paramount to helping us really define that.
For example, this whole concept of data data stewardship, giving ownership back to the business. But how do we actually give them visibility of their catalog or what reports they have? We need to make it tangible for businesses and people in an organization who may not be as tech or data literate. We needed the mechanisms to make that visual and more malleable for them to drive adoption.
Maggie Hays: You spoke a little bit about lineage and domains. Whatโs your personal favorite use case or feature within DataHub?
Vincent Koc:ย I think the most interesting one for me is DataHub Tags, just that any asset or database can be tagged. And that type can come from anywhere. You can query on the data that lives within DataHub, based on those tags. A common example would be, โHey, letโs tag this data as sensitive or PII. And letโs see which downstream reports are inheriting this data.โ
Or knowing whatโs using a lot of PII, or legacy, or unstable data, things like that.
The tag is quite useful in the sense that you can expand on its capability quite simply. And you could just use the SDKs, code, your systems or whatever rules you want, to define those tags as well.
Paul Logan: Youโve been a part of the community for a while now and Iโm sure you have some tips and advice for anyone whoโs just joining the community. What would you say?
Vincent Koc: For me, itโs more around Slack communities and data communities in general.
Donโt be scared โ not everyone knows the answers. Some people are super technical, and some are not, but weโve all got our own skills in different ways. Feel free to put your hand up and go, โHey, this doesnโt make senseโ. Iโm sure thereโs someone out there who will lend you a hand or give you some tips and tricks.
At the same time, if you want to learn, contribute, and build features yourself, nothingโs stopping you. Just give it a go.
Itโs also a great way to understand how the systems and the ecosystem shape up. For example, if I were to go ahead and start building some features on Data Hub, thatโs a really good way for me to get in touch with understanding how data governance works, how lineage works, and start to conceptualize that, and learn from that experience.
I think itโs a great learning experience as well as a way to share ideas and different ways of solving problems within the community.
Maggie Hays:ย Amazing. Thatโs the dream response right there. In all seriousness, weโre just so grateful for you. And the whole Hipages team. You have just been real champions of the project, and youโre doing great work โ both within the community and in your organization. Thank you so much for taking the time to talk with us.
Vincent Koc: Oh, my pleasure. Thank you so much, Maggie!
If you are new toย DataHub, just beginning to understand what โmetadataโ and โmodern data stackโ mean, or youโve just read these words for the first time (howdy, friends! ๐ค ), let us take a moment to introduce ourselves and share a little history;
DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 6.5k members (and growing!) and 350+ code contributors, and many companies are actively using DataHub in production.
We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.
Want to learn more about DataHub and how to join our community? Join us on Slack. ๐
Recommended Next Reads

