DataHub Now Integrates with Google BigLake Iceberg REST Catalog

If you’re running an Iceberg Lakehouse on GCP you can now catalog all of that in DataHub through BigLake’s standard REST API.

Why are so many organizations moving to Iceberg?

Apache Iceberg has become the dominant open table format because it solves the fundamental problem of turning a pile of Parquet files in object storage into something that behaves like a real database table. The Iceberg catalog is the authoritative pointer to the current table state, enabling ACID transactions and concurrent reads and writes from different engines without corruption or delays. 

BigLake metastore is Google’s fully managed, serverless metadata service for Iceberg tables on GCP, enabling complete interoperability between services. For example, a data scientist can create an Iceberg table through Spark, and that table is immediately queryable from BigQuery or any compatible query engine

Our integration

DataHub already had a generic Iceberg connector that works with any catalog implementing the Iceberg REST spec. What we’ve now introduced is support for Google’s Application Default Credentials with explicit OAuth scopes. Rather than bolt GCP-specific auth into the generic Iceberg connector, we built first-class BigLake support as a configuration path.

The connector extracts metadata from every Iceberg table registered in your BigLake catalog. Because BigLake is the single catalog layer, you don’t end up with duplicate entries from different engines seeing the same table through different metadata paths.

Getting started

The updated Iceberg connector with embedded BigLake support is available now in DataHub v0.14.1. Check out the full documentation for details.

Before running ingestion, you need the BigLake API enabled on your GCP project, a BigLake catalog created and pointed at your GCS bucket or BigQuery, and a service account with the correct permissions. 

To learn more, book some time with our sales team or join the conversation in DataHub’s Slack community.

For details on how we work with Google Cloud, visit our Google Cloud partners page.

Future-proof your data catalog

DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

Explore DataHub Cloud

Take a self-guided product tour to see DataHub Cloud in action.

Join the DataHub open source community 

Join our 14,000+ community members to collaborate with the data practitioners who are shaping the future of data and AI.