DataHub and ClickHouse
Built for speed, better together
In my job heading up Partnerships at DataHub, the dashboard I look at most often is the one tracking the most common sources of metadata that our open source community members are bringing into DataHub. The top ten or so are pretty stable – you can probably guess what they are. But over the past year, ClickHouse adoption in the DataHub ecosystem has accelerated faster than that of any other data source.
So this was an unmistakable sign to us, and led us to invest in improving our ClickHouse connector. We wanted DataHub users to be able to catalog and govern their ClickHouse assets alongside everything else in their stack, without compromise.
Fast forward a few months later, and we’re excited to have been invited to be a launch partner for House Mates, ClickHouse’s new partner program. House Mates is designed to help our customers stand up their analytics stacks and AI and Agentic application faster by offering pre-built integrations that actually work, built by teams that understand the ClickHouse ecosystem. This partnership is our commitment to staying current with ClickHouse as it evolves, and we’re looking forward to co-innovating more deeply with the Clickhouse team.
Why ClickHouse and DataHub fit
The most natural partnerships share a foundation. For ClickHouse and DataHub, that foundation is open source.
ClickHouse began as an open-source project and has grown into one of the most widely deployed real-time analytical databases in the world, with 40k+ GitHub stars and adoption at Tesla, Lyft, Capital One, and thousands of large organizations. DataHub has the same origin: open-source from day one, with thousands of active community deployments alongside our managed Cloud product. Both projects made the same bet: build in the open, earn trust through transparency, and let the community push the technology forward.
Beyond philosophy, our mutual users often share the same technical needs. We’re helping them assemble a stack built for speed and scale. ClickHouse is built for speed: columnar storage, vectorized query execution, and compression designed to handle billions of rows without breaking a sweat. DataHub is built for scale: architected from day one for machine-scale operations that can support the most complex multi-platform, multi-cloud data estates. As a result, DataHub users register 4.7 billion metadata records each month!
What our shared customers are building
There’s a pattern we keep seeing in many of our most forward-thinking customers’ data stacks. ClickHouse is used when teams need sub-second query performance at scale for customer-facing dashboards, real-time reporting, and other high-concurrency applications. Those same customers are increasingly also building AI agents and new workflows on top of that data with DataHub context, including metadata, lineage, and semantic information. In this context, AI agents and human data teams need to be able to answer complex data questions that require knowledge of multiple systems, or alert teams to data quality issues before they impact downstream applications.
For example, a leading device intelligence platform needed to store and aggregate billions of raw identification events at scale. They invested in ClickHouse for its speed and scale, and as their data estate grew across teams and platforms, they turned to DataHub to bring visibility and ownership to an increasingly sprawling ecosystem. With DataHub, their data engineers can finally trace lineage from raw tables, making it easier to proactively address data quality issues before they impact downstream applications.
Get started
The DataHub ClickHouse connector is available now for both open-source and Cloud deployments. You can find the full documentation at docs.datahub.com/docs/generated/ingestion/sources/clickhouse, including setup guides, configuration options, and details on the metadata that DataHub extracts.
To learn more, book some time with our sales team or join the conversation in DataHub’s Slack community.
Recommended Next Reads



