Humans of DataHub: Mike Linthe

This week, we had the pleasure of speaking with Mike Linthe, Managing Director of Contiamo. Contiamo finds elegant solutions for complex challenges every day, focusing on scalable cloud applications and the intensive use of open-source tools.
Mike shares his journey with DataHub, how Contiamo is leveraging DataHub for successful client outcomes, his favorite features, and more! This is a conversation you don’t want to miss! ✨
Partial transcription available below, closed captioning available on Youtube.
Conversation Transcript & Highlights
Edited for brevity & clarity
Maggie Hays: We are back with another round of Humans of DatahHub! Today we are joined by our Community Member Mike. Mike, welcome. Give us a little bit of an introduction; who you are, where you work, what you do, we’d love to hear about it.
Mike Linthe: Hey, Maggie, hey, community. I’m Mike, I’m 33 years old, based in Berlin, in Germany — so on the European side of the DataHub, and I’ve been working in consulting for the last decade, mainly in tech-heavy projects. I started off as a Software Engineer working with like these typical, very boring, big enterprise applications. Then I made a little transition to work for PwC as a Manager, and then build up some RPA business in Germany. Now, I’m the Managing Director of Contiamo.
To briefly introduce Contiamo, we are a data consultancy, we are specialized in building data-heavy applications, but also implementing data governance and leveraging data science. So these are the three main pillars we do. We actually used to be a classic SaaS startup, and built our own data catalog with an integrated data virtualization layer — so I have a long history with data catalogs. I fell in love with them very early.
Due to development in the space, if you look at it, so many new tools coming out — awesome tools like DataHub.
We figured out that success in the consulting space was a better adventure to take on, so we now do only individual data consultancy projects. We work with global companies like Mercedes Benz or CBRE, but also some small startups. It’s basically everything related to data; we are mainly an implementation partner, supporting our clients end to end.
Maggie Hays: That’s awesome. I remember us talking about this when we first met, that you had kind of gone through the process of building your own data catalog, and you knew how much of an undertaking that really was… How did you stumble upon DataHub? How did you end up in the Community?
Mike Linthe: It’s basically due to this — So obviously, one part of building a product is looking at your competitors. I made an analysis, looking at the classic enterprise data catalog products, but also I looked in open-source, because open-source was always something we as a company, valued very highly. We have a lot of people that are top contributors at open-source projects, and basically all the projects we do, everything is open-source based. So I also looked at that, and like two years ago, I stumbled across it, set up my own little DataHub, played around a little bit and felt like, “okay,this looks really good.”
It was really interesting, also, to see another product when the way you built it on your own, is similar. Because you see nowadays, [DataHub] solved this challenge, so good, and [our team] really cracked at that for a long time. So basically, we had a look at all of the open-source catalogs as well, and when we decided to make the switch to consulting only, we more or less let loose of our own product, and then decided on what do we think is the best data catalog product out there? And yeah, we stick with DataHub.
I joined the community maybe a little bit over a year ago. And from that moment onwards, I think it was more or less love at second sight, because the first time was to look at the product itself [when we were competitors]. But the Community, I guess we will talk a little bit about as well, but for me, it was really a game changer.
Maggie Hays: That’s amazing. Yeah, I like I like “love at second sight.” At first. You’re like, oh no then you’re like wait a minute… I like what I see here.
Elizabeth Cohen: Let’s talk about the Community, what do you enjoy most about it?
Mike Linthe: It’s remarkable how quick and helpful the community is. So basically, whenever I [send a message in Slack] — I know so we have this time difference, I think a lot of people are also based in the US — but no matter when I post something, someone will come back at maximum, I would say, six hours. And I think that’s incredible, like imagine: you have a corporate tool, and whatever question you have, someone will come back to you with a really good answer in six hours… companies would pay a lot of money for that!
I think it’s very open minded and very welcoming. Everyone is trying to show what you can do and how we can progress together.
I’m not a software engineer anymore, I used to be, but that means I now program very rarely. So for me, it’s sometimes even harder to really follow everything that’s happening there, but I feel everyone is really making an effort to explain and give helpful advice. It was also very easy to get on board in the community, so it’s just a lot of fun trying things out. There’s really a lot of cool stuff within the Community. Also, documentation is really good. I need to say, which is also not very common for a lot of open-source projects. So it’s a really special community.
Maggie Hays: Yeah, I see you jump in every once in a while with more thought-provoking questions or conversations. I’m personally really excited to see the trajectory of the community move, beyond, the troubleshooting support — which, of course, is critical, and we will continue to do that. I think there are really great opportunities for us to have more meta conversations around “How do we do metadata? What are the driving components of that?” So I really appreciate your your contributions there.
Mike Linthe: Yeah, definitely, I love to do it. Because that’s like one of the main ways I can contribute. My team contributes with say, coding and sharing say, connectors. But for me, also, I try to give some maybe, little helpers. For example, today I just posted something like a script to just build your own JSON files. So you can ingest metadata very easily and quickly to Google Sheets. That’s what the spirit of the community is really like: let’s make our lives easier and adopt it and just draft a product, that’s really cool.
Maggie Hays: I’m curious, with the implementation of DataHub, what are kind of the main use cases when you’re working with your clients, why are you using DataHub? What problems are you hoping that DataHub will address for your clients?
Mike Linthe: We have two different paths where we use DataHub.
One path is using it internally. So we also use it for our company. Because, we are an engineering company, our engineers were like, “Hey, first of all, if we want to show clients the product, we should use it ourselves, for doing the stuff we do.”
So the first real use case is really for ourselves to get a hold of all the data assets we have and the infrastructure we want for clients. Sometimes we have built pipelines for the clients, monitor them, and make them run. It was really nice. And, it’s really nice for us to manage documents knowing “Okay, these are the Kubernetes clusters we have, here are the people responsible for this and this client,” things like that. It’s also really good for the machine learning model, so we use it to document stuff.
The second use case, obviously, is the data implementation projects with clients. I love it when you have a modern data stack — which is sadly not too common in Germany yet. Now more and more clients, they adopt, maybe a Data Warehouse like Snowflake, BigQuery, and also DBT is really taking off. I think this is really some great technology also coming out of the open-source community, obviously.
Yeah, and I love the Lineage feature of DataHub, it’s so helpful in a lot of ways also with the tests, etc. The integration got way, way better over time also; That’s definitely something we see a lot of value on the client side. And I mean, the main use case is efficiency, I’d say for this Lineage feature in data engineering teams, so sharing the information about the pipelines. Also seeing the impact of changes in your data pipeline, that’s definitely a big use case, we also have seen a lot of times with clients.
I think a second one is the ownership aspect of saying, “okay, who’s responsible for what?”, and also the overview of “what assets do I have?”
What I really, really love is the openness of the tool. You can ingest pretty much everything, although maybe there’s no real connector for it yet. But you can still ingest that stuff through the data model.
We build a lot of more or less crazy stuff, for example, KPI tracking, and you can do a lot of things, so you can really easily do this individually and showcase it to clients and build it inside a data model. So I think there’s a lot about this Data Discovery part, and also this Data Governance part you can do with the data. And so, it’s a very flexible tool. It’s very open, this is the main advantage of it. In terms of looking at a competitor, obviously, it’s also beautiful and very simple to use. But for me, this is the main game changer. These two these two use cases. Very cool.
Elizabeth Cohen: Going off that, what has been your personal favorite DataHub feature… If you can choose one?
Mike Linthe: Yeah, I’d say sort of Lineage is my favorite […] we tried to build it on our own, and it’s really hard. That’s definitely something I appreciate — the effort and the smartness in how you built this. It’s also pretty flexible in the sense that you can connect whatever you want. I also like Data Profiling, although, I probably did, at least as good a job as you did in your product — but it’s really good regardless.
Maggie Hays: We always take contributions!
Mike Linthe: Exactly… let me say, we have something planned.
Data Profiling, a very helpful feature, I think, also to understand the data. It’s really a game-changer in terms of from the user perspective, especially in Germany and Europe, we have a lot of regulations, I think, a lot more than the US.
What is happening is in terms of data protection, you probably heard about GDPR, and whatever there is, and this is super helpful feature for data scientists and data engineers to understand, “how’s this data looking?” And they can give me some samples that are like non critical, so I do not violate anything.
I really love it. And we personally use it a lot, also, on the client side, but also internally for some of the data we have. So it’s really good.
Maggie Hays: Awesome. So it sounds like you like the whole product, that’s what I’m hearing.
Mike Linthe: For us, it’s really our one go-to tool.
We did a couple of implementation projects as an implementation partner in Germany now. And German companies usually are pretty conservative, I would say, and open-source is also not really a big thing in Germany yet. If you look at the application landscape, we have SAP as the main application provider here. And it’s very isolated, very locked in, so you can’t get out of it. So people are more or less, like, “let me have my standard infrastructure system with SAP and I’m fine.”
We now added a couple of projects and after bringing DataHub in and due to the flexibility of the product, and the overall feeling and how you can work with it, we really convinced a couple of clients with the project itself that they’re like, “Wow, that’s amazing. I didn’t even know that was possible in an enterprise environment”. So it’s definitely a great tool, I think it’s way ahead in terms of what other open-source tools can do there.
It can definitely compete with the Enterprise Tools, too. We had a project where we benchmarked DataHub. So we implemented DataHub, and they hired another agency that implemented another tool, with the same use case, which was very fair, and the feedback was that the client actually liked DataHub more than the other tool.
Maggie Hays: Kind of thinking about the Community and your experience getting familiar with the tool and project, […] I’m curious if you met someone tomorrow who was joining the DataHub Community or starting out with DataHub, what advice would you give them to help them be successful?
Mike Linthe: So I think first of all, do not hesitate to ask any questions, there are no dumb questions. And second of all, people will help you in so many ways.
So, go to Slack, sign up, and start introducing yourself; That’s also one of my favorite Slack channels (#introduce-yourself), I love seeing like, so many people from different sides of the world are connecting in this communities.
Ask questions and also have a look at YouTube, I think the YouTube channel is actually really helpful. There are so many good resources. So I think people should first go to Slack or to YouTube, and go to GitHub, have a look at the GitHub examples.
My favorite place is to bootstrap data file, because it gives a nice overview on all the assets you can put in. And for me as a like semi-technical person that can at least ingest JSON files, this is super helpful; my engineers are usually like, “hey, let’s just build this in Python and make an API colum” and I’m like “ I don’t need it, I just copy paste that stuff in.” So and sometimes quicker than them, I’d say. So a lot of very useful information on the GitHub, Slack, Youtube, and that documentation is also nice on the homepage. So I think these are the four main pillars of getting ready with DataHub.
Just try it, it’s super quick. It’s really awesome for people also that are not 100% technical. The whole community and the team — they made the product so easy to use that it’s just perfect.
Elizabeth Cohen: That’s so great. Looking ahead, for the next six months, year, and so on, what are you most excited to see happen with DataHub and within the Community?
Mike Linthe: So first of all, obviously, adoption, because I think the bigger the Community, the better for all of us. And you see it already. I mean, the speed of implementation is crazy, which also sometimes is a challenge for big companies; “by the way, in three weeks, there will be a new release, and it will have awesome features. So you should get it.” And they are like, “usually we do half yearly releases!” So it’s really quick.
I’m looking forward to fine-grained lineage. So like column-based lineage, it’s really a great feature to get, like more detailed information, and also to provide more value to use cases to different use case, especially on the engineering side, but also on the side of like impact analysis. So there’s a lot of value in that.
Maybe for the Community, I would hope to have more people from Europe. I think it’s also good when you have a lot of people in all the different time zones to help each other sometimes. So basically, for example, now if I look at it, we are pretty much the first ones really doing a lot of work with SAP, although SAP is the main system a lot of people have here. So I see that we need to get more people on board to be quicker and provide more value and discuss use cases, things like that. And so that’s what I’m looking for. But I feel you guys are doing an awesome job and extending the community and making it a good place to be around. So I’m confident that this will really work out.
Maggie Hays: I mean, just from a timing perspective, we have some of the team working on column level lineage, specifically impact analysis, basically extending that impact analysis to the column level. So really, really exciting stuff coming up there.
Well, Mike, it’s just been such a pleasure to speak with you. Thanks for sharing your experience with us and your time. We really love having you in the Community with us.
Mike Linthe: Yeah, I’m also really grateful to be here. And thanks for the opportunity. Shout out to all the people in the Community, continue doing great work.
If someone needs any help, just contact me on Slack. I’m always happy to provide some insights on what we did and how we probably solved some of the things you’re struggling with. So just reach out and thanks a lot for having me.
What is Humans of DataHub?
Humans of DataHub is a series highlighting the wonderful people that are helping define how the DataHub Community collaborates in 2022.
What’s DataHub?
If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (Welcome, friends! 🌈), let us take a moment to introduce ourselves and share a little history;
DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 4.5k (🚀) members and 270+ code contributors, and many companies are actively using DataHub in production.
We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.
Want to join the DataHub Community? Visit https://datahubproject.io and say hello on Slack. 👋