Humans of DataHub: Patrick Franco Braz

Hello, internet friends! After a little break, Humans of DataHub is back in full force.

This week, I had the pleasure of speaking with Patrick Franco Braz, a Data Engineer at Hurb. Hurb’s mission is to make travel easier, connect people to their dreams by providing the best travel experiences at an affordable price.

Humans of DataHub Interview with Patrick Franco Braz

Conversation Transcript & Highlights

Edited for brevity & clarity

Elizabeth Cohen: Welcome to our next installment of Humans of DataHub! You may have noticed we’ve been on a bit of a summer hiatus. We are so excited to be back, speaking with community members about their DataHub journey, and today we are joined by Patrick. Patrick, thank you for being here. I will pass it on to you to introduce yourself.

Patrick Franco Braz: Hello, everyone! I’m 25 years old, I live in Rio de Janeiro, Brazil. I’ve been part of the DataHub Community for a few months. Here in Brazil, I am a Data Platform Engineer at Hurb. I focus on our data infrastructure and take care of our data stack.

Hurb is a travel agency, one of the biggest in Latin America. Hurb has three main products: hotel accommodations, travel packages, and activities. For example, the client chooses an option he wants and chooses dates to travel a year in advance, and Hurb takes care of everything for the client.

So to make this possible, we have 1000 employees, the technology team has 100 engineers and developers, but the data team specifically has 26 members.

Elizabeth Cohen: To kick us off, how did you first come across and learn about the DataHub community?

Patrick Franco Braz: [We already had another data catalog tool, and it didn’t meet our needs] Then we found DataHub, studied the tool, and what captured our attention was that we could fully control the access on the platform. DataHub also has a very friendly UI. So for us, this is very important. Why? Internally, we have a self-service analytics feature that we are trying to spread [and have at a company-wide level]; So any collaborator can access the tool, to find the assets they want. DataHub brings these features and culture to the company.

Elizabeth Cohen: What has DataHub enabled within your organization? You were just talking about the collaboration aspect, I’d love to hear more.

Patrick Franco Braz: Not only does DataHub bring a kind of a central repository to our collaborators to find the data they want, but it also serves as a source of truth for everything.

We have many tools on our data stack, and we are creating a kind of metadata synchronization product process. Each team uses different tools; the ML team has their tools, data engineers have theirs, and BI analysis and other teams use other tools; So we take all kinds of metadata from each to DataHub and pull that information to synchronize metadata across systems. The metadata ingestion framework is very powerful, and given that Python is our main language here, we’re very happy with how easy the process is.

With the Actions Framework, we are also able to build this kind of metadata synchronization process. We are very happy with DataHub.

Elizabeth Cohen: I’d be curious to hear, what has been your favorite DataHub feature or use case?

Patrick Franco Braz: We are trying to improve or increase our data reliability. So the first use case and feature that enables the collaborator not only to find the assets, but even find the validation rules that we apply to that dataset. So, the best use case for this: we use a data quality platform called Anomalo and we sync metadata from there onto DataHub. So our collaborators can see data validation rules on the datasets, and internally on the data team, we can see with lineage and impact analysis, who is being impacted when a data issue occurs.

So, this is important — to build a kind of automation to alert every asset owner. Besides, when we want to change some table job or transformation process or view, we can analyze which assets will be impacted by that. For example, we use Metabase for BI. Every analyst that is building on top of Metabase uses data sets from BigQuery. So we want to know, not just the dependencies, but also see how changing something in BigQuery will impact Metabase. This is very important for us.

Elizabeth Cohen: What do you enjoy most about the DataHub Community?

Patrick Franco Braz: What I enjoy most is the activity. Everyone is trying to help, asking a lot of things. And this is very important because I can learn with questions. I constantly see what people are searching for, and I can learn from that, when the DataHub Core team responds, or in #troubleshoot, I have received a lot of help. And now I think that I have the power and experience to help too. So for me, this is incredible.

Elizabeth Cohen: Absolutely, we have a very special community. When folks join DataHub slack, they initially start out asking a lot of questions. And then a few months later, we’ll see them in Slack channels, answering other people’s questions. So it is really cool, how it creates this sustainable, almost kind of circular, support system and community.

Looking ahead, what are you most excited to see happen with DataHub?

Patrick Franco Braz: So, I talked about impact analysis… column-level lineage will be incredible to have because as I said, sometimes we have to change and have some transformation in the data set, and we do not change everything; it’s kind of a small change in some columns — by seeing the relations in this random array — like what column is used by what table or what view is important — it will be incredible.

Another thing that I think will be important for us. Metadata management is kind of a challenge for us. So collaboration within the UI will help us improve our metadata, ask for metadata creation, or maybe share knowledge across the platform. So these two features are very important.

Elizabeth Cohen: Collaboration within the UI is something that we hear from the community a lot, too. So I’m glad you’re excited about that.

What is your favorite DataHub slack channel and why?

Patrick Franco Braz: Oh, I don’t know if I have a favorite… the channel that I am, almost looking at all the time is #troubleshoot; When I was first becoming familiar with, and learning DataHub, the #troubleshoot channel helped me a lot. It’s a kind of emotional connection — I like to go there and see what people are asking for. And as I said, I feel that now I am really familiar with DataHub and know how things work, so I am always trying to go there and see if I can help someone.

Another two channels that I keep an eye on, now that I am kind of a Product Manager of DataHub for my team, are #announcements and #feature-requests. I want to know what people use them for, the new features on DataHub, and to see what is new, what people are talking about in the community, Humans of DataHub, and so on.

Elizabeth Cohen: There’s so much like camaraderie, you know, like we talked about a few minutes ago. But there’s such a level of camaraderie within the community of just like, I was struggling with this a few months ago, and now let me help you!

The last question is one of my favorite questions to ask our community members — What advice would you give to someone who is just learning about DataHub, or has just joined the DataHub community?

Patrick Franco Braz: The best advice I can give: study the documentation.

When I started I didn’t do that [and I think it would have made things easier.] When I first learned, I just joined the Slack community and I started to ask a lot of questions. It’s important to join and be a part of the Slack community, there are a lot of people to help you. [The Slack community is a trusted source] and you can do everything through the documentation. So my first advice is that, and the second is to work hard and just try out Quickstart. You have to start a kind of a local environment, try your use case, and try things that you are not confident about. This will help you a lot — not only to validate your use case or your idea but maybe even to see a problem [or what else it surfaces]. And then, you can go to the community and see if someone can help you.

Elizabeth Cohen: Awesome. It’s like — join the community, start asking questions, try things out. And if there are challenges, you can go back to the community and get support.

Well, Patrick, thank you so much for your time. It was so wonderful speaking with you. Thank you for all of your contributions and for your future contributions to the DataHub community.

Patrick Franco Braz: Yeah, thank you so much. I’m very grateful for this opportunity. Thanks a lot to the community for their help.
And thanks to everyone behind me inside my company. I’m grateful to my company which helps me and gives me the opportunities to be part of the DataHub community. My team is always on my side, giving me support and advice.


What is Humans of DataHub?

Humans of DataHub is a series highlighting the wonderful people that are helping define how the DataHub Community collaborates in 2022 and beyond.

What’s DataHub?

If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (welcome! 👋), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 4k members and 260+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to join the DataHub Community? Visit https://datahubproject.io and say hello on Slack.

Similar Posts