The DataHub Community is 🔥
👋 Hello, DataHub Enthusiasts!
My, oh my, it has been quite a while since our last DataHub Community & Project Update — 3 months, in fact, 😬😐😑. Let’s cut to the chase and get you up to speed on what the DataHub Community & OSS Project has been up to in Q2’2022.

Aaaaand… we’re back!
The DataHub Community is THRIVING
We continue to see incredible growth in the DataHub Slack Community; over the past 3 months, we’ve grown 46% to over 3,500 members, with an average of ~950 weekly active users. Folks are joining from all corners of the world to get up & running with DataHub, to contribute back to the project, and to talk about all-things metadata management & modern data governance.

We continue to see consistent contributions to the open-source project coming from our Community Members, including forty (40!!) new contributors over the past few months alone.
Improvements to DataHub’s User Experience
The last few months have been jam-packed with improvements to navigating the DataHub UI that fall into a few different categories: Metadata Management, User & Permission Management, and General Usability Improvements.
Glossary Management via the UI
Beginning in v0.8.36, DataHub users can now create, edit, move, delete, and deprecate Glossary Terms from the DataHub UI. Additionally, you can link Glossary Terms to one another with the following relationships:
- Contains — link related Terms when one is a superset of another, e.g., Address contains Zip Code
- Inherits — link related Terms with one is a subtype of another, e.g., Email inherits PII
This expands on our YAML-based Glossary Term support, empowering end-users that may not be familiar or comfortable working with YAML. This has been the most-upvoted request in our UX feature request portal, and we’re thrilled to deliver it to the Community!
Check out this demo from our May Town Hall:
When debugging data-related issues, it’s helpful to understand how a dataset’s schema has evolved over time — new columns added, old columns dropped, column types changed — all of these can have a massive ripple effect on downstream dependencies.
As of v0.8.34, DataHub now provides a “Blame View” of a Dataset’s Schema so users can quickly understand how a field has evolved over semantic schema versions*. Want to learn more? Watch the demo from April’s Town Hall below:
*You can find more info about how we compute versions here.
User/Permission Management
We’re continuing to make improvements to User & Permission management, ensuring that you can safely and effectively roll out DataHub to a wide set of stakeholders with ease. Here are highlights of what you can expect:
- Create and Invite Users to DataHub via the UI — Find this under Users & Groups > Invite DataHub users. Admins can also generate password reset links for their users (as of v0.8.38)
- Create & Revoke Access Tokens via the UI — Find this under Settings > Developer (as of v0.8.38)
- Create and assign View-based RBAC Policies (as of v0.8.32)
Usability Improvements
We are constantly rolling out incremental changes to improve the DataHub user experience. Here are some small — but very impactful! — changes we’ve made recently to improve usability & navigation:
- Add multiple Owners, Tags, Terms to an entity in a single workflow (v0.8.36)
- Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality (v0.8.33)
- Display recent search terms when beginning the search flow (v0.8.32)
- Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as “topics” instead of “datasets” (v0.8.32)
- Showing recent searches in autocomplete (v0.8.30)
Improvements to DataHub’s Developer Experience
We’re continuing to make it easier and easier for developers to begin working with DataHub and to harness the power of the robust metadata model.
Introducing: The DataHub Actions Framework
The Actions Framework (available as of v0.8.34) makes responding to real-time changes in your Metadata Graph easy, enabling you to integrate DataHub into a broader events-based architecture seamlessly. Check out the repo here and watch John’s demo below:
John Joyce demos the new DataHub Action Framework
Improved API Support & Ingestion Docs
Beginning in v0.8.34, we now support OpenAPI endpoints to post, get, and delete entities. Read all about it in the OpenAPI usage guide.
v0.8.36 introduced the new Revokable Token API to support a new type of Access Token that can be revoked & queried, allowing admins to delete tokens for operational & security reasons easily. Read all about it in the Access Token Management Usage Guide.
Last but not least, our Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and content.
Metadata Ingestion — New & Improved Integrations!
Community-led Integrations
As the DataHub Community continues to grow, so does the demand for supporting a myriad of metadata sources that one might find across data stacks. We are so fortunate to have Community Members who are generous with their time & talent by contributing connectors back to the open-source project for others to use. Here are the Community-led integrations we’ve seen over the past couple of months:
These sources are currently marked as “Testing” — we encourage you to try them out and provide feedback in the DataHub #ingestion Slack channel
Integration Updates & Improvements
We are continually fine-tuning our existing Metadata Ingestion Sources to improve accuracy and usability. Here’s a subset of improvements we’ve rolled out over the past few months:
- Support Snowflake ingest via Oauth (v0.8.38)
- AWS Glue — data profiling is now supported (v0.8.36)
- S3 ingestion speed up (v0.8.36)
- Incubating Apache Pulsar source (v0.8.34)
- Update Feast connector to support v0.18 (v0.8.34)
- Improvements to handling BigQuery audit log SQL queries (v0.8.34)
- Airflow Improvements — capture execution runs from lineage backend (v0.8.33) — watch the demo here
- MS SQL ingestion captures table & column descriptions (v0.8.33)
- Trino platform support for Great Expectations (v0.8.33)
- Stateful ingestion for dbt supported (v0.8.32)
Recent In-Person & Virtual Events
In addition to our Monthly Town Halls, Q2’2022 has been jam-packed with in-person and virtual events for the DataHub Community. Here’s what you might have missed:
Data Council: Data Practitioner’s Guide to Data Discovery
Shirshanka Das and I teamed up in Austin, TX in March to present how DataHub stitches together metadata from tools like dbt, Airflow, Spark, Looker, and many others to create delightful data discovery experiences at many companies like LinkedIn, Expedia, Peloton, Saxo Bank, and Wolt.

Maggie & Shirshanka taking the stage at Data Council: Austin
Watch the talk here!
Metadata Day 2022: Governance as Code
Acryl Data and LinkedIn teamed up once again to host this year’s Metadata Day on May 17th & 18th, 2022. This year’s theme is Governance as Code.
The virtual event combined expert panel discussions, lightning talks, and for the first time, a Hackathon!


We had an all-star lineup of speakers at Metadata Day 2022!
Watch all sessions from Metadata Day 2022 here!
Airflow Summit: Building for data reliability with DataHub, Airflow, & Great Expectations
John Joyce and Tamás Németh presented an approach for proactively addressing data quality problems using orchestration based on a central metadata graph. They shared use cases highlighting how DataHub can enable proactive pipeline circuit-breaking by serving as the source of truth for both the technical and semantic health status of a pipeline’s data dependencies.
hayaData & Data Day Texas: The Data Practitioner’s Guide to Metadata
Just last week, I headed to Tel Aviv, Isreal to speak at hayaData while Shirshanka had a slightly shorter flight to Austin, TX to speak at Data Day Texas. Our talks had similar content, focused on providing practical steps & best practices for managing metadata across vast, disparate systems.

Hello from Tel Aviv!
We’ll share recordings as soon as they become available!
One Last Thing —
I caught up with new DataHub Community Member Chris Collins:
Maggie: What’s been your first impression of the DataHub Community from the past few months?
Chris: Well, I’m still new to the community and have only been around it for a few months, but the thing that stands out to me the most is simply how engaged and welcoming the community is. I was blown away by how many people attend Town Halls, answer each other’s questions on Slack, and simply enjoy talking about DataHub with each other. We really do have a great group of people working on an awesome problem together!
M: Ahh, I couldn’t agree more! Thinking about the DataHub project, is there a feature/project update you’re particularly excited about?
C: I personally am most excited about all the usability and UI improvements in the works. While DataHub solves a lot of complex problems, there are still things that can be improved on to make the UX feel cleaner, more polished, and an overall great experience. Am I biased as a more frontend-leaning engineer? You betcha.
M: The people want to know — what song have you been playing on repeat recently?
C: I’ve had a few different things in heavy rotation lately, but I’ve probably been listening to the album Sky Blue Sky by Wilco the most. So I’d have to Impossible Germany as my song!
That’s it for this round; see ya on the Internet 🙂