DataHub Now Integrates with Google Cloud Lakehouse Iceberg REST Catalog

If you’re running an Iceberg lakehouse on Google Cloud you can now catalog all of that in DataHub through Google Cloud Lakehouse’s standard REST API.

Why are so many organizations moving to Iceberg?

Apache Iceberg has become the dominant open table format because it solves the fundamental problem of turning a pile of Parquet files in object storage into something that behaves like a real database table. The Iceberg catalog is the authoritative pointer to the current table state, enabling ACID transactions and concurrent reads and writes from different engines without corruption or delays. 

Google Cloud Lakehouse metastore is Google’s fully managed, serverless metadata service for Iceberg tables on Google Cloud, enabling complete interoperability between services. For example, a data scientist can create an Iceberg table through Spark, and that table is immediately queryable from BigQuery or any compatible query engine.

Our integration

DataHub already had a generic Iceberg connector that works with any catalog implementing the Iceberg REST spec. What we’ve now introduced is support for Google’s Application Default Credentials with explicit OAuth scopes. Rather than bolt Google Cloud-specific auth into the generic Iceberg connector, we built first-class Google Cloud Lakehouse support as a configuration path.

The connector extracts metadata from every Iceberg table registered in your Google Cloud Lakehouse catalog. Because Google Cloud Lakehouse is the single catalog layer, you don’t end up with duplicate entries from different engines seeing the same table through different metadata paths.

Getting started

The updated Iceberg connector with embedded Google Cloud Lakehouse support is available now in DataHub v0.14.1. You can find full documentation here

Before running ingestion, you need the Google Cloud Lakehouse API enabled on your Google Cloud project, a Google Cloud Lakehouse catalog created and pointed at your GCS bucket or BigQuery, and a service account with the correct permissions. 

To learn more, book some time with our sales team or join the conversation in DataHub’s Slack community.

For details on how we work with Google Cloud, visit our Google Cloud partners page.