Release update! Lineage Vis Update, dbt meta, Data Freshness Indicator, & new Java Library

🥂 Happy 2022, DataHub Enthusiasts!

We’ve started off the year with high-impact improvements to user and developer experience; let’s get you caught up on what you may have missed in recent releases.

Lineage Visualization Update: Show Full Entity Names

We know that sometimes entity names can get very looong, making it tough to interpret the lineage visualization. Starting with v0.8.22, you can now toggle between showing the full or truncated entity titles in the lineage vis:

Lineage Visualization Update: Show Full Entity Names

See it in action here!

Automatically map detail from dbt meta to DataHub Datasets

dbt supports capturing critical model-specific metadata using the meta configuration, allowing authors to specify owners, model status, tags, and more. As of v0.8.22, our dbt source now supports actions to map dbt meta values to DataHub Datasets.

For example, if a dbt model has a meta config has_pii:true , we can define an action that evaluates if the property is set to true and add, let’s say, a PII tag to the Dataset in DataHub.

We currently support the following actions to extract values from dbt meta and apply them to DataHub Datasets:

  • add_tag — add a Tag to the Dataset
  • add_term — add a Business Glossary Term to the Dataset
  • add_owner — add an Owner of the Dataset

Here’s an example of how we can map values from the dbt meta config to a Dataset:

Example of how details from dbt meta are mapped to a Dataset’s Tags, Terms, and Owners in DataHub

Example of how details from dbt meta are mapped to a Dataset’s Tags, Terms, and Owners in DataHub

Read the docs here!

🆕 Data Freshness Indicator

DataHub users can now easily see how recently a Dataset was updated using the Last Updated timestamp in the Stats details of a Dataset.

This freshness indicator, coupled with recent query activity, top users, and table & column stats, helps end-users make informed decisions about which datasets are relevant and trustworthy.

freshness indicator

Last Updated is available as of v0.8.22 for Snowflake, BigQuery, and Redshift datasets and can be disabled by setting include_operational_stats:false in the source configuration.

🆕 Introducing: Java REST Emitter

As our Community continues to grow rapidly, we are working hard to make it easier & easier for folks to get up and running with DataHub. With this in mind, we released a Java REST emitter library in v0.8.22 to programmatically generate metadata events from Java-based clients.

The io.acryl:datahub-client Java package offers REST emitter API-s, which can be easily used to emit metadata from your JVM-based systems. For example, the Spark Lineage integration uses the Java emitter to emit metadata events from Spark jobs.

Incubating Metadata Sources and Features

As of v0.8.20, we are incubating the following:

Metabase â€” currently in beta, this plugin extracts Charts, Dashboards, and associated metadata. So far we have tested it on PostgreSQL and H2 databases and are looking for community members to help test out functionality!

Removing Stale Metadata from the UI — using Stateful Ingestion, DataHub can soft-delete Tables and Views from SQL sources so they will not be surfaced in the DataHub UI.

Have feedback to share about our Metabase connector or handling stale metadata? Tell us all about it in our #ingestion Slack channel!

Community Contributions

Congrats to our first-time contributors!

@MikeSchlosser16 @pramodbiligiri @aditya-radhakrishnan @abiwill @gfalcone @iasoon @lvicentesanchez @grumbler @MugdhaHardikar-GSLab @jawadqu @nsbala-tw @merqurio @hyunminch @sudotty @cccs-eric @xiphl

Big thanks to our repeat contributors!

@treff7es @dexter-mh-lee @RyanHolstien @rslanka @dannylee8 @anshbansal @kevinhu @sgomezvillamor @mayurinehate @varunbharill @gabe-lyons @EnricoMi @hsheth2 @jjoyce0510

Similar Posts