Accelerating Connector Development with AI Skills

February 2026 Town Hall Highlights

February 2026 Town Hall Highlights

By:

Lakshay Nasa

|

March 12, 2026

Building custom metadata connectors manually is time-intensive and difficult to scale.

At our February Town Hall, we explored how teams are accelerating connector development with DataHub Skills, an AI-assisted development toolkit. We also discussed how DataHub is evolving as the context layer for modern data systems.

The session featured a live demo of DataHub Skills generating connectors in just a few hours, a look at Foursquare’s geospatial platform using DataHub as its discovery engine, and a forward-looking roadmap across the AI, Discover, Observe, and Govern pillars.

This recap highlights the key demos, architectural insights, and roadmap updates from the session.

Accelerate connector development with DataHub Skills

Modern data stacks evolve quickly, introducing new ingestion tools faster than connectors can be built. When a system is not supported by an ingestion connector, lineage gaps appear.

For example, if your team uses dlt, an open-source data loading library, to move data from Postgres to Snowflake without a supported connector, DataHub cannot observe that synchronization layer in its lineage graph. As a result, downstream dependencies remain hidden.

Diagram showing broken lineage in DataHub due to a missing ingestion connector. Data flows from Postgres through dlt to Snowflake, but dlt is not ingested into DataHub, resulting in missing lineage and hidden downstream impact. — A missing ingestion connector breaks lineage visibility between Postgres and Snowflake, hiding downstream impact in DataHub.

Without full lineage visibility, a developer might rename a column in Postgres without realizing that it feeds Snowflake tables and dbt models.

The issue is not just broken dashboards. It is missing context across the data pipeline.

Historically, building a new ingestion source required significant manual effort. Developers had to:

Research the target system’s APIs
Understand the DataHub metadata model
Scaffold code from scattered examples
Wait through multiple review cycles

That approach does not scale with the pace of modern tooling. To address this challenge, we introduced DataHub Skills, an AI-assisted development toolkit designed to reduce connector development friction. These skills, available on skills.sh, integrate with AI coding assistants such as Claude Code and Gemini to automate research, planning, and quality checks.

“DataHub is only as powerful as the metadata that you bring into it. As new tools emerge, lineage gaps grow. These AI skills help teams keep up.”

— Maggie Hays, Founding Product Manager, DataHub

How do DataHub Skills work together?

Connector development typically follows three stages: understanding standards, planning the architecture, and validating the implementation.

DataHub Skills formalize each stage into a reusable workflow.

Skill	Core function	Practical benefit
load_standards	Applies the same coding and testing standards used by DataHub maintainers during review.	Aligns connectors with production requirements before implementation begins.
datahub-connector-planning	Analyzes the target system, maps entities to the DataHub metadata model, and generates a structured development plan.	Reduces manual documentation review and architectural analysis.
datahub-connector-pr-review	Validates structure, testing coverage, and conventions before a pull request is opened.	Reduces back-and-forth review cycles prior to submission

Instead of relying on informal guidance or reverse-engineering existing connectors, developers start with explicit standards, produce a structured plan, and validate changes before opening a pull request.

This shifts the workflow from trial and error to guided implementation.

Lineage in practice: Generating a dlt connector

In the live demo, DataHub Founding Product Manager, Maggie Hays, walked through how to use DataHub Skills to generate a connector for dlt.

This segment focuses on the planning workflow and the artifacts the skill generated.

DataHub Skills demo planning an AI-assisted dlt connector

How does connector ingestion change the development workflow?

Before ingestion, the Postgres dataset showed no downstream lineage. The graph appeared isolated.

Lineage update after dlt ingestion

After ingestion:

A one-hop lineage view connecting Postgres to Snowflake
dbt models visible downstream
Column-level lineage exposing field-level dependencies

If a developer renames the “white rating” column, DataHub immediately surfaces affected downstream entities.

The improvement is not just faster connector development. DataHub also makes downstream dependencies visible across the full pipeline.

Together, these DataHub Skills shift connector development from an ad hoc process to a structured, review-ready workflow. For teams expanding their data stack, this reduces friction while improving metadata completeness.

Availability and community contribution

The DataHub GitHub repository and skills.sh host the DataHub Skills toolkit.

To make things even more exciting, we’ll send a DataHub swag box to:

The first 10 community members who successfully merge a pull request using these tools
The first 10 contributors who help review connectors using the skills

Community participation expands connector coverage and strengthens lineage visibility across the ecosystem.

Foursquare spotlight: Democratizing geospatial intelligence

Geospatial data is notoriously difficult to unify across sources. Datasets often use incompatible projections, inconsistent schemas, and require specialized tooling.

Foursquare aggregates and indexes more than 50 public datasets into a universal H3 grid system known as the Spatial H3 Hub.

To make these indexed datasets discoverable and securely accessible, they rely on DataHub as the discovery engine and governed context layer behind the H3 Hub.

“DataHub basically acts as the discovery engine for the H3 Hub.”

— Shashir Ambasta, Tech Lead, Geospatial Intelligence Platform, Foursquare

Through its Iceberg REST Catalog, DataHub provides:

A unified metadata layer for H3-indexed datasets
A discovery and catalog layer for datasets managed as Apache Iceberg tables
Policy-based access control for secure dataset access
A GraphQL API that powers dataset discovery for internal applications

Instead of querying storage systems directly, users and services interact with DataHub to discover, filter, and access datasets across the entire H3 Hub.

This metadata foundation also powers Foursquare’s Spatial Agent, which retrieves relevant datasets through DataHub and generates SQL queries and visualizations in response to natural-language questions.

DataHub serves as the connective layer that enables dataset discovery, governance, and programmatic access across Foursquare’s geospatial platform.

Stay tuned for a dedicated deep dive into Foursquare’s architecture in an upcoming post.

DataHub product roadmap 2026: AI, Discover, Observe, and Govern

The DataHub roadmap focuses on four pillars: AI, Discover, Observe, and Govern.

James Mayfield, VP of Product at DataHub, framed the roadmap as a shift from documenting the enterprise data landscape to actively operating it. It moves DataHub from a human-centric metadata map to a context platform that both humans and AI agents navigate.

The roadmap is structured across three horizons: now (0–3 months), next (3–6 months), and later (6+ months).

DataHub product roadmap across AI, Discover, Observe, and Govern pillars showing Now (0–3 months), Next (3–6 months), and Later (6+ months) capabilities. — DataHub product roadmap overview across AI, Discover, Observe, and Govern pillars.

AI and context: From human productivity to agent-driven workflows

DataHub’s AI vision centers on enterprise context.

Teams use DataHub to find trustworthy data, understand lineage, define data quality checks, and monitor compliance policies.

DataHub’s AI roadmap and vision for agent-driven workflows.

“Tomorrow, the big bet that we’re making is that both humans and agents will be able to leverage DataHub.”

— James Mayfield, VP of Product, DataHub

Now: Making enterprise context accessible to humans and AI tools

Today, DataHub exposes structured enterprise context to both users and external systems. It acts as a programmable context layer built on the metadata graph.

DataHub provides structured enterprise context through three mechanisms:

Ask DataHub Plugins: Extend Ask DataHub with MCP-based integrations that retrieve context and trigger actions across external systems such as GitHub, dbt, and Snowflake.
Agent Context Kit: A toolkit for building AI agents that access DataHub’s metadata graph and enterprise context from platforms like LangChain, Snowflake Intelligence, or Crew.ai.
DataHub MCP Server: Provide secure, programmatic access to DataHub’s metadata graph through the Model Context Protocol (MCP), enabling AI tools and agents to retrieve structured enterprise context.

This enables AI systems to retrieve structured enterprise metadata across tables, lineage, dashboards, and documentation.

Next: Enterprise-ready agent infrastructure

The next phase focuses on production readiness, secure access, and deeper cross-tool workflows.

Custom Ask DataHub agents
Enterprise-ready MCP servers
OAuth-based user authentication
Broader MCP-compatible tool support

This stage strengthens governance and security while enabling richer cross-system AI workflows.

Later: Native agentic workflows inside DataHub

Over time, the roadmap moves from exposing context to orchestrating workflows directly within DataHub.

Future capabilities include:

Native agentic workflows inside DataHub
Multi-step AI-driven data workflows
Human-in-the-loop governance automation

This shift moves DataHub from documenting the data landscape to actively operating it through governed, AI-driven workflows grounded in enterprise context.

Discover: From trusted data search to full supply chain visibility

DataHub enables teams to discover and understand trustworthy data across the enterprise.

Today, DataHub enables teams to:

Together, these capabilities form the ground truth for understanding how data moves through the enterprise.

Lineage, usage signals, and governance metadata allow teams to assess impact, trace dependencies, and make informed decisions.

The next phase expands discovery beyond human workflows. Both humans and agents will need to navigate the full enterprise data supply chain, from upstream operational tables to downstream dashboards, metrics, and models.

Now: Expanding ecosystem coverage and semantic interoperability

Strong discovery depends on the breadth of ingestion. The more systems connected, the more complete the lineage graph.

DataHub is expanding integration coverage across:

Microsoft Fabric ecosystem, including Data Factory and OneLake
Continued support and expansion across Azure services
Connectors for Dataplex, Flink, Monte Carlo, and Informatica

Next: Metrics catalog and Open Semantic Interchange (OSI)

A major focus within discovery is the evolution of the Metrics Catalog and participation in Snowflake’s Open Semantic Interchange (OSI).

The goal is to:

Define canonical metrics and semantic models
Support two-way metadata synchronization
Connect to metric definitions across Snowflake, dbt, Tableau, and other tools
Enrich metrics with semantic descriptions and AI-friendly context

By standardizing semantic metadata, DataHub enables both humans and agents to reason about metrics, dimensions, time rollups, and definitions consistently across platforms.

This strengthens discovery at the semantic layer and not just at the dataset level.

Later: Cost-aware discovery powered by lineage

DataHub is making discovery operational.

With a complete end-to-end lineage view, DataHub can identify:

Compute resources that are not consumed downstream
Storage that is rarely queried
Transformation pipelines generating low-value outputs

The roadmap includes surfacing cost-saving opportunities directly from lineage insights, helping teams take action on assets that consume resources without delivering value.

Observe: From monitoring to AI-powered resolution

Observability at DataHub is evolving from real-time monitoring toward AI-assisted resolution and prevention.

Today, teams rely on DataHub to detect data issues and manage incidents through human-driven workflows.

Over time, observability becomes more adaptive, automatically prioritizing critical datasets, assisting with root cause analysis, and preventing degraded data from reaching production systems or AI agents.

Now: Strengthening anomaly detection with smart assertions

The immediate focus is improving detection across the data landscape.

DataHub Cloud is enhancing:

These capabilities help teams spot unexpected changes before they impact production systems.

Detection remains the foundation of effective observability.

Later: AI-assisted resolution and health intelligence

Once detection is mature, the focus shifts toward resolution and prevention.

Incident management uplifts

DataHub is enhancing resolution workflows through:

Integrations with Jira, PagerDuty, and other incident systems
Lineage-driven impact analysis to identify upstream and downstream dependencies
AI-driven root cause analysis with proposed remediation
Customizable downstream notifications and soft banners on impacted assets

The goal is to reduce mean time to resolution (MTTR) by connecting anomalies directly to impact and ownership.

Data health trends and AI-generated insights

Beyond incident response, observability becomes proactive.

Upcoming enhancements include:

Interactive data health dashboards
Trend analysis across assertions and incidents
AI-generated insights surfaced directly within health views
Drill-down capabilities to investigate patterns over time

This enables organizations to move from reactive monitoring to continuous improvement of data quality.

“We want to leverage AI to put monitors on the most important datasets at your organization.”

— James Mayfield, VP of Product, DataHub

Over time, observability moves beyond alerting humans. It automatically identifies critical assets, learns normal patterns, and prevents degraded data from reaching downstream consumers or AI agents.

Observability shifts from detecting incidents to actively protecting the data supply chain.

Govern: From shift-left controls to AI-assisted governance

Governance is a critical pillar of the DataHub platform.

It is a central part of how many prospects, customers, and open-source community members engage with DataHub, particularly around ownership, policy enforcement, and compliance.

Now: Operationalizing shift-left governance

Today, DataHub provides structured governance capabilities across the enterprise data ecosystem, including:

Together, these capabilities create a strong governance foundation.

DataHub embeds policies, definitions, and compliance signals directly into the metadata lifecycle.

This shift-left approach improves audit readiness and strengthens compliance posture over time.

Later: Reducing governance toil with AI assistance

The next phase focuses on reducing the cost and effort of maintaining strong governance at scale.

Planned capabilities include:

AI-proposed data descriptions
AI-suggested classifications and tagging
Automated identification of critical data elements for regulatory review
Agentic workflows to coordinate complex, multi-persona governance processes

This includes supporting audit preparation and regulatory frameworks such as GDPR and SOX compliance.

Rather than replacing human oversight, these capabilities aim to reduce repetitive manual effort while preserving accountability and control.

Governance evolves from policy management to intelligent automation, helping organizations maintain compliance while scaling their data ecosystem.

Join the DataHub community shaping the future of data context

DataHub is built for long-term collaboration with the community. The community drives our roadmap through feedback, pull requests, and the real-world production challenges you share during these town halls.

Three ways to get started today:

Watch the full town hall recording to see our AI ingestion demo and Foursquare’s showcase in action.
Join the DataHub Slack community to connect with over 14,000 data practitioners and contribute to new open-source connectors in the #contribute-code channel.
Explore DataHub Skills on Skills.sh and start building custom integrations for your data stack.

DataHub’s direction is shaped by the practitioners who use it every day. If you’re building with metadata, AI, and governed data systems, your input matters.

Recommended Next Reads