It’s HERE! Say Hello to Column-Level Lineage in DataHub
Hello, DataHub Enthusiasts!
The past month was jam-packed with big DataHub feature announcements and excellent community-led code and content contributions. Without further ado, let’s get you up to speed!
🤩 The DataHub Community is Unstoppable
September 2022 was a record-setting month for the DataHub Community across the board, where we…
- welcomed 377 new Slack Members in a single month (!!)
- merged 291 Pull Requests from 47 Contributors to the open-source DataHub Project
- connected live on Zoom with 185 September Town Hall Participants
I must say… it’s invigorating to see this Community of Data Practitioners continue to come together to set the new standard of how we tackle metadata management and data governance within the modern data stack.

Datahub Community at a Glance
We continue to see more and more engagement in our Monthly Town Halls, and we are always thrilled to welcome new contributors to the project!
Join us on Slack • RSVP to our Next Town Hall • Follow us on Twitter
It’s TIME! Column-Level Lineage in DataHub is Here
During the September 2022 DataHub Town Hall, we unveiled UI support for column-level lineage within the DataHub UI. This has been one of the highest-requested features from Community Members, and we are so excited to have you all start working with it!
Starting with DataHub v0.9.0, you can visualize column-level dependencies within the lineage view. This is an incredibly powerful resource to trace fine-grained inter-dependencies across datasets and reporting resources. We support auto-extracting column-level lineage in the first iteration during Snowflake and Looker ingestion.

Screenshot of Column Level Lineage in DataHub
Exciting, right?! Take it for a spin in the DataHub Demo here, and don’t forget to watch Gabe Lyons and Chris Collins give the full run-down below!
Case Study: How Stripe uses DataHub to power observability within their Airflow-based ecosystem
During the September Town Hall, we heard from Divya Manohar (Software Engineer at Stripe) about how she and the Stripe Team have leveraged DataHub to surface critical Airflow pipeline execution metrics for thousands of hourly tasks and their associated datasets. By leveraging metadata already sent to DataHub from Airflow and customizing the DataHub UI, the Stripe Team built out custom historical reporting to monitor:
- Historical DataJob Timeliness Tracking to understand the reliability of a given pipeline over time
- Complex Pipeline Status Tracking, providing a high-level status and estimated land time for jobs comprised of thousands of tasks
- Critical DataJobs Historical SLA Observability

Example of Stripe’s DataJob Timeliness Tracking in DataHub
I highly recommend watching Divya’s presentation; she and the Stripe Team are building next-level resources in DataHub to navigate the complexities of their data stack — prepare to be highly impressed!
Sneak Peek: Automated PII Classification
Data Governance Practitioners are keenly aware of how important it is to accurately catalog and classify data assets with the appropriate PII category; historically, this has been a laborious, manual effort.
During the September Town Hall, we shared a sneak peek of the upcoming functionality in DataHub to automatically apply PII classifications to datasets. The goal is to minimize the amount of manual tagging required drastically and to bolster the coverage of compliance categorization within your data warehouse. Check out the demo to learn more!
Metadata Ingestion Improvements, Galore!
The DataHub Community is hard at work, ensuring our existing Ingestion Sources are performant and extract as much valuable metadata as possible. Here are some highlights from v0.8.44, v0.8.45, and v0.9.0:
- Snowflake: Improved Snowflake connector is now stable & supports column-level lineage extraction (old version is renamed snowflake-legacy)
- BigQuery: bigquery-beta is improving rapidly (structs, unified usage)
- LookML: automatically clones your Git repo, supported in UI!
- Looker: ingestion requires much less memory
- dbt: extracts column-level meta mappings
- Tableau: extracts chart usage
- Presto on Hive: supports stateful ingestion, extracts table descriptions + views
- Core: checkpoint state compression, delete + rollback support for timeseries aspects
285 people have contributed to DataHub to date
During September, we merged 291 pull requests from 47 contributors, 16 of whom contributed for the first time (names in bold):
@aditya-radhakrishnan @aezomz @amanda-her @Ankit-Keshari-Vituity @anshbansal @atul-chegg @BogdanAntoniu78 @chriscollins3456 @codesorcery @daha @danilopeixoto @de-kwanyoung-son @divyamanohar-stripe @firasomrane @gabe-lyons @GyuhoonK @hemanthkotaprolu @hieunt-itfoss @hsheth2 @jeffmerrick @jinlintt @jjoyce0510 @justinas-marozas @ksrinath @liyuhui666 @maaaikoool @maggiehays @Masterchen09 @mayurinehate @mkamalas @ms32035 @MugdhaHardikar-GSLab @ngamanda @pedro93 @pghazanfari @remisalmon @rslanka @RyanHolstien @shirshanka @skrydal @szalai1 @topleft @TonyOuyangGit @treff7es @upendrao and @mohdsiddique, @ltxlouis
We are endlessly grateful for the members of this Community — we wouldn’t be here without you!
One Last Thing —
I’m thrilled to welcome Paul Logan to my team at Acryl Data as Developer Relations Lead. He brings a wealth of dev rel experience, and I am so excited to team up with him to get the DataHub Community to the next level.
I caught up with him this week:
Maggie: We’re so excited to have you on board as the first DataHub Dev Rel Lead! What has been most surprising to you in your first month on the team?
Paul: The most surprising thing to me is the real depth of knowledge everyone on the team has; it’s incredible to be working with people with such expertise!
M: I looove to hear it! What’s a song you’ve been playing on repeat recently?
P: “Hangar” by 8455
https://open.spotify.com/embed/track/1nXiUKuAu4mHte6Gt2HRdJ?utm_source=generator
That’s it for this round; see ya on the Internet 🙂
Connect with DataHub
Join us on Slack • Sign up for our Newsletter • Follow us on Twitter