Dependency Impact Analysis, Data Validation Outcomes, and MORE!
👋 Hello, DataHub Enthusiasts!
I hope this finds you safe & healthy ❤️
February was a very busy month for the DataHub Community with stellar progress against our Q1 Roadmap. Let’s dig in!
NEW! Dependency Impact Analysis
Raise your hand if you’ve ever deployed a schema migration or breaking change, only to find out it broke downstream pipelines, reports, ML models, etc., that you had no idea even existed.
🙋♀️ ️I’ve done this more times than I care to admit 😅😅
Gone are the days of pushing a breaking change and hoping for the best! Beginning with v0.8.28, DataHub users can quickly view a given entity’s complete set of downstream dependencies, making it easier than ever to proactively identify the impact of schema migrations, data deprecation, and more.

Using the Impact Analysis Lineage view lets you see the complete set of downstream entities that a change to a given entity may impact. You can also search, filter, and export the list of entities to CSV to slice & dice to your heart’s content.
Test out Lineage Impact Analysis here!
NEW! Display Data Validation Outcomes in the UI
Starting with v0.8.28, DataHub now supports surfacing outcomes from Great Expectations validations in Dataset Entities! End-users can quickly view the complete history of validation outcomes to understand the trustworthiness of your data. Stay tuned for future work to provide native support for other data testing suites; in the meantime, watch this demo by John Joyce.
Ongoing Improvements to the DataHub UI
UI Refresh of Users, Groups, Policies, and Tags
The User Detail Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see which Groups the User belongs to.

See it in action here.
We also overhauled the User Group Detail Page, allowing you to assign an email address, Slack Channel, Group Owner, and manage group members via the UI. View, filter, and search across all data assets owned by the User Group.

Test it out here.
The Tag Details Page also has a new look! You can now edit the definition, assigned owners, and tag color via the UI.

Try it here.
We refreshed the Policies Page, allowing you to see DataHub policy details, associated DataHub Users/Groups, and policy status at a glance.

Test it out here.
Notable Metadata Model & Ingestion-Based Features
Track changes to entities using the Timeline API
We’re excited to roll out the new Timeline API, providing a unified timeline of changes to entities in the metadata graph to give a complete picture of how your metadata has evolved. We currently track changes to:
- Technical Schema (ie. new/removed fields)
- Ownership
- Documentation
- Tags
- Glossary Terms
Coming SOON! We will be building out UI support to visualize the full timeline of changes; more to come!
Read the docs here.
First Milestone: Fine-Grained Lineage available in the Metadata Model
The Metadata Model now supports Fine-Grained lineage (aka Column-Level lineage) for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a data job.
Define Dataset-to-Dataset lineage via YAML
As demonstrated in the February 2022 Town Hall, you can set Dataset-level lineage via YAML. This is great for teams with more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources. Massive shoutout to Community Member Edward Vaisman for contributing this back to the project! You can watch his demo here:
Miscellaneous Metadata Ingestion Updates:
- Incubating: PowerBI Ingestion source, ClickHouse Ingestion source
- BigQuery Profiling: configurations to support profiling the latest partition/shard; disable profiling by partition overall
- Tableau improvements: Workbooks are now modeled as “Containers”
- Kafka Stateful Ingestion — shoutout to @claudio-benfatto for building this out!
Notable Docs Updates
NEW! Tips for Searching within DataHub
Have you ever wondered how to make the most of searching within DataHub? Check out this doc put together by @xiphl.
Improvements to Metadata Model Docs
This is a huge win for the Community — we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model — take a look here.
Community Contributions
We had 47 people contribute to the DataHub Project across v0.8.27 & v0.8.28!
Congrats to our first-time contributors!
@Ankit-Keshari-Vituity @bskim45 @buggythepirate @cuong-pham @daha @eddyv @gmcoringa @guidoturtu @Huyueeer @jieqiu0630 @mmmeeedddsss @ne1r0n @ngamanda @pppsunil @satyamkrishna @stephenp-gr @tc350981 @vcs9 @wangqinghuan @zhaofengnian18 and @mohdsiddique
Shout-out to our repeat contributors!
@abiwill @aditya-radhakrishnan @anshbansal @arunvasudevan @claudio-benfatto @dexter-mh-lee @eburairu @gabe-lyons @hsheth2 @jeffmerrick @jjoyce0510 @kevinhu @ksrinath @maaaikoool @maggiehays @mayurinehate @MugdhaHardikar-GSLab @rslanka @RyanHolstien @sgomezvillamor @ShubhamThakre @swaroopjagadish @treff7es @vlavorini @xiphl @zhoxie-cisco
One Last Thing —
I caught up with DataHub Community Member, John Joyce:
Maggie: Feb was a super busy month for DataHub — what are you most excited for the Community to start using?
John: I am PUMPED about DataHub’s expansion into Data Quality with the Great Expectations integration. I think automated, recurring quality checks will become an incredibly useful trust signal for both consumers & producers of data.
M: SAME. This is going to be huge for the DataHub Community. Unrelated — is there a song you’ve been playing on repeat recently?
J: Mercy Falls — CharlestheFirst