Extracting Column-Level Lineage from SQL

Data people really care about data lineage, particularly from SQL.

We looked at a bunch of open-source SQL automated lineage tools and found that many shared the same underlying problem: they were unaware of the underlying table schemas, and hence couldn’t generate accurate column-level lineage.

A metadata management platform and data catalog like DataHub already has APIs for retrieving the schema for any tables in your data stack. So, we built a SQL lineage parser that’s schema-aware and can take advantage of DataHub’s APIs to generate accurate column-level lineage from SQL queries across a wide array of dialects. More…

Click here to read the full article, posted on DataHubProject.io

Similar Posts