DataHub Community: Jan’23 Rundown
Hello, DataHub Enthusiasts!
The DataHub Community has jumped right into 2023 with awe-inspiring community-led contributions and exciting ways to make DataHub easier to deploy. Let’s dig right in!
The DataHub Community is buzzin’ with activity
January 2023 was such an exciting month for the Community. We welcomed 402 new Slack Members, merged 184 Pull Requests from 29 Contributors to the open-source project, and came together live for a Data Contracts Experts Panel & AMA with Shirshanka Das and Chad Sanderson.
I learned so much about Data Contracts from this conversation. If you’re looking for a better understanding of why to bother with Data Contracts or where they should fit into your data stack, this AMA is packed with helpful direction. Check it out here 👉
On the Slack side of things, the Community has been as vibrant as ever with over SIX THOUSAND (!!!) members and ~1k weekly active users talking all-things-metadata management and jumping in to help either out.

If you’re just getting started with DataHub, join us in Slack! Our Community is full of incredibly kind and supportive people who are excited to help you get going 😊
Announcing: The DataHub Data Practitioners Guild
Earlier this month, I announced the Data Practitioners Guild— a space to celebrate incredible members of the DataHub community that have gone above and beyond to contribute to the open source code, help others in their implementation journey, and share their DataHub stories far and wide.
This one is near & dear to me; I feel so fortunate to have teamed up with these folks in one way or another in 2022 and to spotlight their incredible work.
What’s to Come : Q1 2023 Roadmap
As I shared in our January Town Hall, we have A LOT of exciting plans for this quarter. Here are some of the highlights:
- Subscriptions & Slack Notifications — soon, you’ll be able to subscribe to changes to an entity in DataHub and receive automated notifications directly in Slack
- Search Relevance Improvements — improvements to pagination, search ranking, autocomplete, and more
- DataHub Action: Tag & Term Propagation — automatically propagate tags and terms between entities connected by lineage edges
- Fivetran <> DataHub Integration — ingest critical metadata about your Fivetran connections into DataHub
We’ll also be conducting research across a few areas:
- Discovery: Improving the Browse Experience — identifying ways to tailor the browse experience to help end-users find high-impact resources with fewer clicks
- Discovery: Ingestion Error Reporting — improving structured error collection to facilitate debugging
- Discovery: URN Casing– Understanding typical pain points with metadata ingestion due to inconsistent URN cases
Check out this video to hear more details 👉
Community Case Study: How the Notion Team is automating metadata propagation in DataHub
We LOVE to hear from our Community Members about how they are incorporating DataHub into their data stack and automating processes around metadata management.
During DataHub’s January Town Hall, we heard from Ada Draginda, Staff Data Engineer at Notion. She shared clear examples of how she and her team use DataHub’s APIs and Python Emitter to propagate priority classifications between related DBT datasets programmatically.
Check out the 5min walkthrough below and try it yourself with their open-source code!
Community Contribution: Improvements to documentation editing in DataHub
Speaking of standout Community Members — we’ve been working closely with Amanda Ng and Harvey Li from Grab to shepherd through their code contribution to overhaul DataHub’s documentation functionality.
Our next release (targeted the first week of February 2023) will support a “what you see is what you get” editing experience, making it much more intuitive to maintain documentation via the UI.
Chris Collins gave a great run-through of how we partnered closely with the Grab Team during the January Town Hall — watch it here 👉
Simplifying DataHub: Removing Schema Registry requirement
The Community asked, and we are delivering: soon, Confluent Schema Registry will no longer be a required component to deploy DataHub.
Want to learn how we made that happen? Check out Pedro Silva’s (Acryl Data) overview of that work from the January Town Hall 👉
Introducing: DataHub Lite
We’re always looking for ways to make DataHub easier and more accessible for Data Practitioners to use. With this in mind, we have begun experimenting with a new tool called DataHub Lite: a slimmed-down SDK & CLI to interact with DataHub’s rich metadata graph, but removing the need to deploy all the infrastructure components.
As Shirshanka mentioned during his demo of DataHub Lite at January Town Hall, this is targeted toward Data Practitioners that “live and die by the command line” 🙋♀️ Count me in!
Watch Shirshanka & Harshal Sheth’s overview & demo from the January Town Hall here 👉
That’s it for this round of Community Updates — see y’all in Slack 😊