Six months with Humans of DataHub

Six months ago,
Maggie Hays and I launched Humans of DataHub — A series highlighting the wonderful people that are helping define how the DataHub Community collaborates. During this time, we have spoken with Community Members from all corners of the world as they shared their experiences being a part of the DataHub Community, how their team uses DataHub, and so much more.
To celebrate six months of #HumansofDataHub, we’re taking time to reflect on the themes we’ve heard during these wonderful conversations and a collection of our favorite moments.
Quotes edited for clarity and brevity.
Vibrant, Supportive, and Intellectually Stimulating: the DataHub Community is one of a kind
One of our favorite questions to ask during these conversations is, “What do you enjoy most about the DataHub Community?” Time and time again, we heard that people love the vibrant conversations, rapid and thorough support, and fast-moving pace of the project.
“[The DataHub Community is] really extraordinary… I’m not trying to sell this to anyone, but it’s really one of a kind. Sometimes you need help with some problem solving, or just need to get in contact with the Core Developer Team… everyone [in DataHub Slack] is super helpful and responsive. And of course… all the great contributions — especially on the ingestion side — popping up left and right… is really, really great to see” — Fredrik Sannholm, Engineering Team Lead, Wolt
Community members are eager to share their lived experiences and to help others along the way; regardless of where you are in your data journey, you’re sure to find others that can lend you guidance and perspective.
“It’s such a vibrant community. Everyone is going through different phases of the data journey… and people bring different perspectives. Some teams are a lot more mature than us when it comes to data… [by seeing] the thinking behind each team, and how they’re approaching the problem, it helps DataHub grow.
[Even when] we think, “we’re not quite there yet”, there’s an interesting dynamic where we’re like, “Oh! We can grow in this way… that’s how the best of teams who use data are thinking about it.” It helps give us perspective.” — Kartik Darapuneni, Data Applications Tech Lead, Included Health
“When I am stuck, I know that the Community is behind my back that is able to offer advice. […] It is very intellectually stimulating; There are data practitioners across the globe in this community leveraging their own experience and contributing their expertise to help each other and stimulate meaningful discussion. This is truly amazing.” — Harvey Li, Senior Data Engineer, Grab
Connecting the Dots with DataHub
Through our Humans of DataHub conversations, we’ve heard so many stories about what DataHub has enabled across organizations. We continue to hear success stories about empowering Data Discovery, connecting the dots with Data Lineage, and more.
“DataHub actually enables us on the data discovery part. And we actually use the Presto on Hive plugin (which is actually the one that we contributed back to the community) to ingest the metadata for over 80,000 tables into DataHub. And the amazing part is that we managed to ingest this huge amount of metadata within less than 15 minutes. Of course, we put some parallelism in place, but the performance is amazing.” — Harvey Li, Senior Data Engineer, Grab
“Watching lineage in action by connecting all the dots, all the datasets is just absolutely amazing. Understanding where the data comes from makes a big difference. […] We like the model-centric approach in angular architecture. It’s abstract and generic enough, so it adapts to many datasets while capturing all the particularities at the same time.” — Sergio Gómez Villamor, Senior Data Engineer, Adevinta
DataHub’s configurability allows for DataHub Admins to customize the platform for their end-users specific needs.
“BrowsePaths, where you can specify where a dataset is located for browsing, is a very handy tool to cater to different user groups as we can put soft-links to common datasets at different points in the catalog.” — Liu Xianglong, Data Platform Engineer, Centre for Strategic Infocomm Technologies, Singapore
Getting started with DataHub? Join us on Slack!
We asked folks what advice they would give to folks that are just getting started with DataHub; the main takeaway is to join the Slack Community! You’ll find plenty of support from your fellow data enthusiasts, and you’ll be able to help others out in no time.
“Join the DataHub Slack. That’s the first piece of advice, just do it. […] Community-wise, post questions frequently […] and maybe like three months later, six months later, you will be answering the questions!” — Hyejin Yoon, Data Engineer, SOCAR
“Number one, the documentation is very helpful, so review that and take some time to sort of dig through that documentation. […] Also, what we also find particularly helpful is that the (DataHub) Slack channel has all of the previous messages and whatnot. I find looking through those Slack channels as well, for previous discussions, super helpful. And lastly, don’t be shy about posing questions to the community. That’s something I’ve noticed that a lot of people with questions, jumping in and answering is really helpful, because a lot of times we all have similar questions” — Steven Po, Senior Data Engineer, Coursera
We can’t wait to see what the next six months (and years!) bring to the DataHub Community. 💟