​Context to Action: May 2026 Town Hall Highlights

The context layer is no longer the new idea. It is the foundation teams are building production AI on.

That shift was the through-line of the May town hall. A year ago, the conversation was whether a catalog could feed an agent at all. This month, the teams on stage were past that question and into the harder one: how do you keep context trustworthy, discoverable, and current once hundreds of agents depend on it?

This month brought iFood putting the DataHub Analytics Agent to work as it consolidates toward core agents, Grab’s context store, dltHub’s Claude Code workflow, and the DataHub Context Platform we launched the same day. Five stories, one thread — context to action.

From the community: commitments, meetups, and summit season

A few things worth sharing before we get into the sessions:

  • What the survey told us – The survey we flagged last month is in, and you asked for more real-world walkthroughs and peer-led sessions. So we’re bringing community show-and-tells where members demo their own setups, a Slack map to help new joiners find their way, a faster path to a human when the AI assistant falls short, and sharper docs search.
  • SVAI meetup – We co-hosted “The Missing Data Layer for ML” in Silicon Valley with dltHub and LanceDB — founders, practitioners, and ML builders in the room.
  • Summit season — We had a great run at Snowflake Summit, and Databricks Summit is next (June 15–18, San Francisco, booth #622).
DataHub co-hosted “The Missing Data Layer for ML” in Silicon Valley with dltHub and LanceDB, bringing together founders, practitioners, and ML builders.

iFood: from 8,000 personal agents to focused core agents

Alexandre Miyazaki, Data Governance at iFood, walked through how one of the largest companies in Latin America is consolidating its agent strategy.

iFood runs at scale — 180 million orders a month, 65 million active users, 8,000 employees, around 2,000 of them touching data infrastructure daily. In April 2025 the company set a target of one agent per employee, roughly 8,000 agents, and hit it. By April 2026 the posture shifted: with AI clearly adopted, the focus moved from many narrow agents to fewer core agents solving the company’s common problems, backed by dedicated teams. The team is now in a tooling phase, evaluating what fits.

iFood's AI journey timeline: April 2025 sets a one-agent-per-employee target of ~8,000 agents, March 2026 completes its DataHub deployment, April 2026 shifts focus to core agents, May 2026 begins the tooling phase
iFood’s path from a one-agent-per-employee target to a focused set of core agents, with DataHub deployed by the data governance team along the way.

The analytics problem — letting people ask questions of iFood’s data in natural language — is one of those priority problems, and Alexandre saw a strong fit in the DataHub Analytics Agent.

The analytics agent fits like a glove for what we need — natural language questions answered from our own data.”  

— Alexandre Miyazaki, Data Governance, iFood

There was one catch: iFood runs its LLMs internally and couldn’t point the agent at an external reasoning provider. So Alexandre built the fix and contributed it upstream — a feature to configure custom LLM endpoints as the reasoning engine for the DataHub Analytics Agent, now merged into the open-source project. Early proof-of-concept tests validated the connection to iFood’s internal platform, and the team is exploring how it could extend to more users from there.

iFood is taking the deliberate path, and Alexandre was clear about the groundwork that makes agents trustworthy. As the team brings more metadata into DataHub, they’re being intentional about how it’s modeled, and setting a real governance bar: agents should only reach data that carries proper classification and ownership, meets a minimum quality threshold so answers aren’t built on bad data, and stays within each user’s existing access permissions.

 Alexandre Miyazaki on the configuration blocker with iFood’s internally deployed LLMs, and the custom LLM endpoints feature he built and contributed upstream to solve it.

Grab: from catalog to agentic context engine

The deepest production story came from Grab. George Hurley, Product Manager on the data engineering platforms team, and Aezo Teo, Software Engineer, walked through how their internal DataHub deployment — called Hubble — went from a UI-first catalog to a system built to serve agents. 

The scale sets up the problem. Grab runs more than 117 petabytes in the data lake, takes on around 300 terabytes a day, and tracks roughly 240,000 tables. Hubble catalogs around 20,000 assets — tables, metrics, ML models, features, dashboards, and, more recently, context documents.

The telling number is the usage shift. Grab released a Hubble MCP in August 2025, and today about 30% more people reach Hubble through the MCP than through the UI. For a tool that started entirely UI-based, that’s a fast reorientation, and it’s the reason the rest of the talk is about serving agents rather than humans clicking around.

Bar chart comparing how people use Hubble today, with MCP usage about 30% higher than UI usage.
Within a year of shipping an MCP, more of Grab’s Hubble usage runs through agents than through the UI.

“We have about 30% more users on Hubble through the MCP than through the UI.”  

— George Hurley, Product Manager, Grab

Grab framed the journey as three steps: 

  1. Catalog the data signals
  2. Certify the important ones
  3.  Evolve the catalog into a context store. 

The first two are foundational. Certification, which Aezo covered, is backed by datahub-actions and moves an asset through states — Uncertified, Certified, CertifiedPlus, and Revoked — with the level decided by traversing upstream lineage, so an asset only reaches CertifiedPlus when its upstreams meet the bar too. The interesting work is step three.

What breaks when context is everywhere?

Once teams could write context and use it with the Hubble MCP, context showed up everywhere — and that became the problem. Someone encountering a context file created by a person they’d never met had no way to know whether to trust it or where it applied. Quality varied widely. Some teams documented well, others barely. There was no reliable way to tell good context from stale.

A mental model: three constructs

Grab’s first move was to define where different kinds of context should live:

  • Data context is the freeform, mostly markdown material that explains how to do a piece of analysis — this is what lands in Hubble as documents.
  • Skills are the generalized, reusable analysis abilities, managed in a separate Grab-wide system.
  • Data entity metadata is everything tied to a specific asset: table, metric, and attribute documentation, lineage, and column descriptions — the output of the certification process.

The point of the model is routing. Describing one table is entity metadata. Describing how several tables and metrics work together for a part of the business is data context. Capturing how to do a type of analysis in general is a skill. At consumption time an agent draws on all three, but giving humans a clear place to put each kind is what makes the corpus manageable.

Diagram of Grab's three context constructs — Data Context, Skills, and Data Entity Metadata — each with a cue for when it applies, converging into agent consumption.
Grab’s routing model: which kind of context goes where, and how all three feed an agent at query time.

Three ways to get context in

Because teams had managed context so differently, Grab offered three onboarding paths rather than forcing one. A GitLab sync takes structured documents and syncs them to Hubble through a CI pipeline. The Hubble UI allows copy-paste, and it’s by far the least used. The most important is MCP writeback: because users were already in the Hubble MCP doing their analysis, adding a write tool to that same MCP let them promote a good context doc in place, without switching tools.

“If onboarding had a high barrier, we’d never reach the scale where every team has context.”  

— George Hurley, Product Manager, Grab

That’s why MCP writeback mattered most: it met people where they already were, instead of asking them to go somewhere new to document what they’d just figured out.

Aezo Teo demos a context document syncing from GitLab into Hubble and a Cursor agent retrieving it through the MCP.

How does semantic search work in Hubble?

On DataHub 1.4.0, Grab moved Hubble from keyword matching to semantic search, generating embeddings across three areas: datasets, metrics, and context documents. On ingestion, Hubble fetches entities through GraphQL, partitions and chunks the text, generates embeddings (via LiteLLM with providers like Cohere, Bedrock, or OpenAI), and persists them through DataHub’s GMS into an OpenSearch semantic index. A query follows the reverse path: a GraphQL semanticSearch call hands the natural-language query to GMS, which converts it to an embedding, matches it against that index, and returns ranked results.

To make the search trustworthy, Grab validates against a curated golden dataset mapped to real agent use cases, measuring MRR, Recall@20, and P95 latency — latency matters because agents need fast answers. Next milestones: relevance scores so clients can set thresholds and filter low-confidence results, and returning the matching chunks and raw text for re-ranking, which improves accuracy and gives agents more to work with.

Architecture diagram of Hubble's semantic search: metadata sources fetched via GraphQL, chunked, embedded, persisted through DataHub GMS into an OpenSearch semantic index, with a query path where GMS embeds the query and returns ranked results.
Hubble’s semantic search pipeline on DataHub 1.4.0 — embeddings generated across datasets, metrics, and context documents, stored in OpenSearch and queried through GMS.

Results, and the lesson that lands

By the time of the talk, Hubble held more than 3,000 context files, paired with a skills governance framework and an evals framework owned per business vertical. Each vertical is accountable for the accuracy of its own context and any agent built on top of it.

The learning George closed on was about agents versus raw tools. A prebuilt agent, he argued, is like a bike: easy to jump on, gets you where you’re going, does one thing. Exposing raw MCP tools is different — people build their own bike, a different bike, a scooter, things you didn’t anticipate.

“The most exciting solutions came when we exposed our raw MCP tools to users.”  

— George Hurley, Product Manager, Grab

For a platform team, that’s the case for investing in high-quality tooling rather than only shipping finished agents. It ties back to the GrabX launch event, where many of the AI experiences Grab demoed lean on MCPs like Hubble underneath.

DataHub Context Platform launch

Shirshanka Das, Co-founder and CTO of DataHub, announced the launch of the DataHub Cloud Context Platform — the same context-layer pattern these teams have been building by hand, now packaged into the product.

The premise is the one running through every session: a talk-to-data agent fails not because the model is weak but because it doesn’t know what the business means. The Context Platform addresses that in three stages.

  1. Context Intelligence mines the raw signals you already have — query logs, semantic models, and usage and popularity signals from the warehouse and BI tools — and turns them into a bootstrapped semantic layer. It runs continuously, filtering for high-signal queries (multi-table joins, interactive sessions, high-frequency use) and inferring the metric definitions and query patterns behind them. The output is a set of context documents Shirshanka calls semantic anchors.
  2. Context Hub is the review layer. The bootstrapped context isn’t trusted blindly — data experts come in to approve, reject, or enrich it, and crucially, to test the impact of a change on Text-to-SQL results before it ships. Those evals run on DataHub Cloud, so an expert can see whether a proposed change is safe to make.
  3. Maintaining context closes the loop. Because most agent usage happens outside DataHub, the platform intercepts those calls: when a question is ambiguous — say “what is the current revenue?” with MRR, NRR, and ARR all in play — the agent asks the user to pick the right definition rather than guessing, and that choice is captured back as an annotation on the context document. The context graph keeps improving without anyone remembering to maintain it.
DataHub Cloud architecture diagram showing data sources feeding a context store and context layer, exposed to AI tools through MCP, Context Kit, APIs, Skills, and Ask DataHub.
The architecture DataHub is building toward — a context store and context layer that both agents and humans draw on.

The launch landed with a customer proof point. Miro built a three-layer context stack on DataHub, sitting between the query and the warehouse, and measured the difference on its analyst agent.

 “Starting with Snowflake metadata alone, our analytics agent answered about half of our benchmark questions correctly. After layering in DataHub as our context platform, we nearly doubled accuracy.”  

— Ronald Angel, Product Manager, Data Platform, Miro

Snowflake metadata alone couldn’t bridge the semantic gap; bringing context together through DataHub closed it, taking Snowflake Cortex accuracy from close to 50% to around 90%. The open-source community saw a parallel story with Pinterest, covered in the April town hall.

The architecture, the semantic-anchor format, and the full reasoning are in the launch blog and the Context Platform page, so this recap points there rather than restating them.

Shirshanka Das on how DataHub keeps context current by capturing user choices as agents run, rather than waiting for manual updates.

Connector update

Maggie Hays, Founding Product Manager at DataHub, ran through the open-source connector work, framed around a simple point: an agent is only as good as the metadata behind it, so breadth and depth of ingestion both matter.

On the breadth side, DataHub shipped a run of new connectors: Airbyte, Matillion, dltHub, and Informatica on the pipeline and ETL side, ThoughtSpot for the BI layer, and a community-contributed Timescale connector. In development are Monte Carlo — which will pull data quality signals into DataHub and auto-raise and resolve incidents, much like the existing dbt test handling — plus SAP Datasphere, AWS QuickSight, and streaming connectors for Firehose and Kinesis.

Logos of DataHub's new Q2 2026 connectors — Airbyte, Matillion, dltHub, Informatica, ThoughtSpot, Timescale — plus in-development connectors Monte Carlo, SAP Datasphere, AWS QuickSight, AWS Firehose, and AWS Kinesis.
New connectors shipped in Q2 2026, with more in development across data quality, SAP, BI, and streaming.

The depth work spanned more than 250 ingestion PRs across April and May, which Maggie grouped into four themes:

  • Great Expectations is no longer a hard dependency. SQL profiling now defaults to SQLAlchemy across all SQL connectors, with the same coverage and a documented path back to GX for teams that want it.
  • Semantic-layer support expanded. Building on dbt Semantic Models and Snowflake Semantic Views, DataHub now ingests Unity Catalog Metric Views and Sigma Data Models, so the definitions in your warehouse, BI tool, and agent stay in sync.
  • Lineage coverage got wider. New support for Snowflake dynamic tables (with or without DDL access), Sigma stitched back to source warehouses, Glue and Iceberg, Athena mapping to the catalog instead of raw S3 paths, and PowerBI via Microsoft’s official M-Query parser.
  • Ingestion got faster at scale. BigQuery policy-tag extraction is now batched, the SAP HANA connector got a production-ready overhaul with column-level and stored-procedure lineage, and Teradata runs that took hours now finish in minutes by skipping unchanged tables

The v1.6.0 upgrade guide has the full detail.

dltHub: agentic data engineering in Python

Elvis Kahoro, Developer Experience and Ecosystem Lead at dltHub, closed the sessions with a live demo of how dlt and DataHub work together from Claude Code.

dltHub is an open-source Python SDK for moving data — source to pipeline to destination, where the source can be a REST API, a SQL database, a CSV, or a dataframe, and switching the destination from DuckDB to Snowflake or BigQuery is roughly one line plus credentials. The new dltHub connector means a dlt pipeline can register its metadata and lineage back into DataHub. Both tools ship MCP servers and skills, so the whole workflow can run inside Claude Code.

The demo built a pipeline pulling commits from the DataHub GitHub repo into DuckDB, validated the data locally in a Marimo notebook using IBIS to compile Python into SQL, then ran a DataHub ingestion pipeline to register the result. From there, Elvis queried the catalog through the DataHub MCP without leaving Claude Code.

Diagram showing Claude Code writing a dlt pipeline that loads GitHub data into DuckDB, cataloged by DataHub, with context feeding back to the agent via MCP.
The loop Elvis demoed: an agent writes the pipeline, dlt moves the data, and DataHub provides the context that feeds back to the agent.

The payoff was less about either tool alone than about the two composing. Querying the catalog through the MCP, Elvis could ask what was in the BigQuery catalog or what columns a pipeline had without leaving Claude Code.

“Having the context powered by DataHub is really nice, because we can guarantee the questions people ask get answers that are correct. ”  

— Elvis Kahoro, Developer Experience and Ecosystem Lead, dltHub

His broader point was about open source. Because dlt, DataHub, IBIS, Arrow, and Marimo are all open, a feature that needs to exist across two products doesn’t require filing a ticket and waiting — you open the PR yourself, in partnership with the other team.

Elvis Kahoro queries the DataHub catalog through the MCP inside Claude Code, without switching tools.

Five stories, one thread

The teams this month weren’t debating whether context belongs at the center of production AI. They were showing what it looks like once it is. iFood contributed a custom LLM endpoints feature upstream so the DataHub Analytics Agent could run on its internal models. Grab turned a catalog into a context store that more people now reach through an MCP than the UI. dltHub showed the whole loop running from Claude Code. And we packaged the pattern into the Context Platform.

The thread is the one we opened with: context isn’t the new idea anymore. It’s the substrate.

A few ways to go deeper: