Humans of DataHub

Humans of DataHub Abhishek Sharma

We are over the moon to launch Humans of DataHub, a series highlighting the wonderful people who are helping define how the DataHub Community collaborates in 2022.

If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (welcome aboard! 🚀), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 2.3k members and 100+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

For our inaugural Humans of DataHub feature, we are joined by Abhishek Sharma, of Bharti Airtel, where he is a Senior Software Engineer, Big Data DevOps.

What do you enjoy most about the DataHub Community?

The DataHub community is a community of experts from different domains that help you with anything you ask. The pace at which the project is moving and the vision is great.

What has DataHub enabled within your organization?

Right now DataHub is being used as a catalog by people to discover datasets. We have integrated various data sources like Hive, Oracle, Kafka, Mongo, Postgres and we’re looking forward to integrating more.

What are you most excited to see happen with DataHub in 2022?

We want to use DataHub as a central metadata solution leveraged across the organization for various metadata needs. In 2022, we are looking to do DataQuality, Observability, and Lineage with DataHub.

What’s your favorite DataHub feature/use case?

The push based async model that allows users to emit metadata (datasets, lineage and other custom metadata) from the source and then visualizing it on the DataHub UI.

Thank you, Abhishek, for speaking with the team and for all of your contributions to the DataHub community.

Want to learn more about DataHub and how to join our community? Say hello on Slack. 👋

Recommended Next Reads