Continuous Context: Why Your AI Documentation Is Already Lying to You

TL;DR

  • Documentation has always decayed, but AI agents can’t compensate for stale docs the way humans can. They confidently operate on whatever the doc says, which means static documentation in an agentic system produces confidently wrong outputs instead of obvious failures.
  • Continuous context is the maintenance discipline within context management: the practice of treating the context AI agents consume as a living system that’s actively kept accurate as the world changes. It spans three distinct types (declared, derived, and observed), each with its own origin and its own management cadence.
  • Keeping context fresh is an infrastructure problem, not a discipline problem. Manual processes don’t scale, just as they didn’t scale for documentation or data quality. The path forward is a context graph with change detection across the full stack, staleness detection for human-authored content, freshness SLAs per context type, and agent-native interfaces.

Documentation has always decayed. For most of the history of software, that was annoying but survivable. A senior engineer could read a stale architecture doc, pattern-match against their experience, and fill in the gaps. We built entire engineering cultures around the assumption that humans would compensate for bad information.

That assumption no longer holds. AI agents are now reading the documents we wrote for ourselves, and they don’t compensate. They confidently operate on whatever the doc says (or doesn’t say, because the doc is missing), even when the doc is months out of date and quietly wrong.

This is the context management problem nobody is talking about. Everyone is investing in writing better context for agents. Almost nobody is investing in maintaining it. Static context fails the moment the world it describes moves on, which in a modern data stack is constantly. Continuous context is the discipline of keeping that material accurate as the world changes, and it’s what closes the gap between writing context for agents and being able to trust what the agents do with it.

Why documentation decay is a physics problem, not a people problem

Here’s an uncomfortable truth every engineer knows but rarely says out loud: documentation is stale the moment you write it.

This isn’t a people problem. It’s a physics problem. The world that a document describes is changing continuously. Schemas evolve. Ownership transfers. SLAs get renegotiated. Columns get deprecated. New tables appear. Business logic shifts. A document is a snapshot. Reality is a movie.

We’ve lived with this forever. When docs were for humans, staleness was tolerable. A senior engineer would read a stale architecture doc, pattern-match against their experience, and fill in the gaps. Humans are remarkably good at compensating for bad information.

AI agents are not.

When an agent reads a context document that says user_id is in the accounts table, and it was actually migrated to identity.profiles three weeks ago, the agent doesn’t figure it out. It confidently operates on wrong information. The same failure happens with semantic drift. If the finance team quietly redefined “active customer” to exclude trials, every agent generating revenue reports from the old definition is producing numbers that are technically correct but business-wrong. No schema changed. No pipeline broke. But the context was lying.

In an agentic world where systems are making decisions, generating pipelines, and modifying data, confidently wrong is worse than ignorant.

The current advice is some version of: “Just tell the agent to update the docs.” Or: “Make it part of your workflow, after every PR, update the context files.”

This is equivalent to telling engineers to write unit tests by relying on their discipline alone, before CI/CD existed. We know how that story ends. Manual processes don’t survive contact with competing priorities, deadlines, team turnover, or the sheer volume of changes in a modern data stack.

Even more sophisticated approaches like git hooks, PR checks, and linting for doc freshness only cover what happens in code repositories. They miss the majority of context that matters: metadata changes, pipeline behavior, data quality shifts, access control updates, business logic encoded in BI tools, and changes in upstream systems you don’t control. They also miss the fact that context isn’t living in one place to begin with.

The architecture diagram is in a Slack thread, the business rules are in Confluence, the data dictionary is in a shared drive nobody updates, and the undocumented knowledge is in three engineers’ heads. Even a perfectly disciplined manual process can’t keep all of that aligned.

We’ve been here before: the data quality parallel

If this pattern feels familiar, it should.

Five years ago, data quality was a manual process. Someone would notice bad data, file a ticket, and an engineer would investigate. The tooling assumed humans were in the loop on every check. Then Great Expectations, dbt tests, Monte Carlo, and DataHub’s own Observe module emerged, and suddenly data quality became observable, measurable, and automated.

Nobody seriously argues anymore that data quality should be a discipline problem. It’s an infrastructure problem, and we built the infrastructure.

We’re at the same inflection point for context quality.

Right now, context quality is manual. Someone notices an agent behaving badly, traces it back to stale context, and manually updates a doc. Then the cycle repeats next week with a different agent and a different stale doc. This doesn’t scale, and it never has scaled, in any analogous discipline. What we need is the equivalent of data observability for context: automated detection of context drift, impact analysis for context changes, freshness monitoring, and self-healing pipelines that keep context current.

The reason this is worth saying out loud is that the data quality story tells us how this ends. It ends with infrastructure. The teams that figured out data quality first got a compounding advantage. The teams that figure out continuous context first will get the same.

What continuous context actually means

Quick definition: What is continuous context

Continuous context is the practice of treating the context AI agents consume as a living system that’s actively maintained against the changing state of the world it describes, rather than as static documentation written once and forgotten. It combines computation, curation, and inference to keep declared, derived, and observed context fresh enough for agents to act on with confidence.

It’s worth being precise about how this relates to the other term in the air right now: context engineering.

Context engineering, as Anthropic and others have framed it, is the discipline of curating what lands in the model’s attention budget at inference time. Which tokens make it into the window. How tools are described. How retrieval is structured. How few-shot examples are chosen. It’s a real and important discipline, and if you’re building agents, you should read Anthropic‘s piece on it.

Continuous context sits one layer beneath. It’s not only about what goes into the window at inference time. It’s about whether the source material that feeds that curation is still true. You can engineer the most elegant context window in the world, but if the underlying schema description was written eight months ago and the table has been restructured twice since, you’re engineering with corrupted inputs.

The two disciplines are halves of the same problem. Context engineering optimizes the curation step. Continuous context maintains the substrate that step draws from. Neither is complete without the other.

If you’ve been following the broader argument we’ve been making about context management, continuous context is its temporal dimension. Context management is the organization-wide infrastructure that delivers trusted context to every agent and team. Continuous context is the maintenance discipline that keeps the material inside that infrastructure accurate as the world changes. One is the substrate, the other is the practice that keeps the substrate honest. (For the full breakdown of how context management and context engineering relate as separate layers, see context engineering vs context management.)

Here’s what the distinction looks like in practice across five dimensions:

Static documentation Continuous context
Update model Written once, updated when someone remembers Maintained continuously through computation, curation, and inference
Source of truth A document somewhere The live state of the systems the context describes
Change detection A human notices something is wrong The system detects drift and flags or re-derives
What it serves Whoever opens the doc Agents querying for the specific context they need, with freshness and provenance attached
Failure mode Stale doc nobody catches until something breaks Drift surfaced as a signal before the agent acts on it

Not all context is created equal: the three types

Before we can talk about how to keep context fresh, we have to acknowledge that context isn’t monolithic. Different kinds of context come from different places, change at different rates, and require different management strategies. A system that treats all context the same will fail at all of them.

There are three types worth distinguishing.

1. Declared context

Quick definition: What is declared context?

Declared context is institutional knowledge that humans know and write down: business rules, domain definitions, organizational conventions, domain ontologies, and strategic intent. It changes slowly but is impossible to compute from technical metadata alone.

This is the stuff that lives in someone’s head until they write it down. “We soft-delete customer data, never hard-delete.” “When we say revenue, we mean only Contracted Annual Recurring Revenue (CARR).” “The finance team considers Q1 to start in February.” “This company builds cars.”

Declared context is slow-changing, but it does change. And when it changes, nobody updates the doc. The management challenge here isn’t speed. It’s awareness. The system needs to know this context exists, know when to question it, and periodically surface it for human re-validation.

The right mental model for declared context isn’t a materialized view. It’s a contract that needs periodic renewal. Someone wrote it. Someone has to keep agreeing it’s still true.

2. Derived context

Quick definition: What is derived context?

Derived context is summarized from fast-changing technical state: schemas, lineage, freshness, quality scores, access patterns, and incident histories. It can and should be computed automatically from underlying signals rather than maintained by humans.

Derived context is what you get when you summarize the live state of your data systems. The current schema of a table. Its lineage. Its freshness. Its quality score. Who owns it. Who queries it. When it last broke.

The management challenge here is computational. Detect changes in the underlying signals, re-derive the summary, serve the updated version. Humans shouldn’t be in this loop at all. If a column is added to a table, no human should have to go update a doc somewhere to reflect that. The system should know.

This is where the materialized view metaphor lands hardest. A doc describing a table’s schema is, ideally, a rendered view over the actual schema. When the schema changes, the view re-renders. The document is the surface. The data is the source.

3. Observed context

Quick definition: What is observed context?

Observed context is what nobody wrote down but the system can infer from actual behavior: which tables are queried together, which columns are joined despite no documented relationship, which dashboards break when a specific pipeline is late.

Observed context is the most underrated of the three. It’s what emerges from how people actually use the system, regardless of what the docs say.

Two tables that have no documented relationship but get joined in 80% of the queries that touch either of them, that’s a relationship. A dashboard that breaks every time a specific upstream pipeline runs late, that’s a dependency. A column that’s technically in the schema but hasn’t been read by any query in six months, that’s a deprecation in waiting.

None of this is in any document anyone wrote. All of it is observable from logs. And all of it is exactly the kind of context an agent navigating unfamiliar territory needs most.

Why all three have to coexist

The key insight is that these three types of context need to live in the same system, with different management cadences and different strategies.

  • A context system that only handles derived context will miss the institutional knowledge that makes agents genuinely useful
  • A system that only handles declared context is just a wiki with a reminder system
  • A system that only handles observed context is a behavioral analytics tool with no opinions about ground truth.

You need the full spectrum.

What a continuous context system actually looks like

If you take this idea seriously, what would you build? Six things.

1. A context graph, not a document store

Context isn’t flat files. It’s a graph of entities (datasets, pipelines, owners, policies, models, features, business definitions) with relationships and attributes that change at different rates. Some things change hourly: freshness, row counts. Some change weekly: ownership, schemas. Some change rarely but critically: what “revenue” means in this company, which regulations apply to which data.

The system needs to model this heterogeneity and know the management strategy for each node. A context graph isn’t a nice-to-have visualization on top of a document store. It’s the actual primitive. The reason is that maintenance is fundamentally a graph problem. When something changes, the system has to know what depends on it, and that’s a question only a graph can answer.

2. Change detection across the full stack

You can’t maintain what you can’t observe. The system needs to detect changes not just in code repos but across the entire data and AI ecosystem: schema registries, orchestrators, quality monitors, access control systems, BI tools, feature stores, vector databases, model registries.

Every change is a potential trigger for context invalidation. Including, critically, invalidation of declared context that a human wrote months ago but that a schema migration just made quietly wrong. The system has to be wired into enough of the stack to notice.

3. Intelligent summarization, not just aggregation

Raw metadata isn’t context. An agent doesn’t need to know that 47 columns were added to a table last quarter. It needs to know that the table was significantly restructured, that the revenue column now uses a different calculation methodology, and that downstream dashboards haven’t been updated yet.

This requires summarization that understands what matters for the context consumer, which is increasingly an AI agent with a specific task. The summarization layer is where most metadata platforms stop being useful for agents. Dumping every attribute into the window is the same failure mode as dumping every document. The window fills up. The signal drops.

4. Staleness detection for human-authored context

This is the hardest and most valuable capability in the whole system.

When someone wrote “this table contains all active customer subscriptions” a year ago, the system should be able to compare that declaration against current reality. Does the table still look like a subscriptions table? Has its schema drifted? Has its usage pattern changed? Are there now two tables that could plausibly match that description? When the answer to any of those questions is “yes,” the system should flag it for human review, not silently let the gap widen.

This is what turns declared context from write-and-forget into write-and-monitor. It’s also where most of the long-tail value in continuous context lives, because declared context is where the most expensive failures happen.

5. Freshness SLAs per context type

Different context has different staleness tolerances. A data dictionary can be a day old. An incident status needs to be current within minutes. A schema description should update within an hour of a migration. A business definition might only need annual re-validation, but the system should know when it was last validated and flag it when it’s overdue.

The system should let you define freshness SLAs per context type and alert when they’re violated. This is the analogue of data freshness SLAs in data observability, and it’s just as load-bearing.

6. Agent-native interfaces

The output isn’t a wiki page. It’s context available through MCP or other interfaces that agents can query for exactly the context they need, at the granularity they need, with confidence scores that reflect both freshness and provenance. Was this derived from live metadata five minutes ago, or declared by a human eight months ago and never re-validated?

An agent should be able to ask “give me everything I need to know about the orders table to write a correct query” and get back a structured, current, trustworthy response that blends all three types of context, with provenance and freshness attached to each piece. That response is the unit of work the system exists to produce.

Why the intelligence layer is the hard part

You might be reading this and thinking it sounds like a metadata platform with a summarization layer on top. In some sense, it is. But the hard part isn’t the plumbing. It’s the intelligence layer that bridges all three types of context and reasons about cross-type impact.

Consider what happens when a schema changes. The system doesn’t just need to update a schema description. It needs to reason about impact. What derived context is now stale? Which declared context, written by a human months ago, might now be contradicted by reality? Which observed usage patterns suggest this change will break things nobody documented? Should the system proactively notify agents that are mid-task with now-stale context?

Or consider the subtler case: a business team quietly redefines what counts as an “active customer.” No schema changes. No pipeline breaks. But every piece of context that references active customers is now wrong, and the system needs to understand that a change in declared business context can invalidate derived technical context downstream.

This is context management in the truest sense. Not just storing and serving context, but actively maintaining its accuracy, coherence, and fitness for purpose across all the different ways context originates and evolves. The plumbing is tractable. The intelligence layer is the work.

The flywheel

We’re entering a world where AI agents are autonomous participants in data operations. They discover datasets, build pipelines, resolve incidents, and handle complex tasks across the full stack. The quality of their work is bounded by the quality of their context.

If we get continuous context right, we unlock a world where agents can be genuinely autonomous, because they can trust the information they’re operating on. If we get it wrong, we’ll spend the next decade debugging agent failures that all trace back to the same root cause: the docs were lying.

The companies that figure this out first won’t just have better AI agents. They’ll have a compounding advantage, because good context makes agents smarter, smarter agents generate better metadata, and better metadata produces better context. It’s a flywheel, but only if the context stays fresh.

The age of write-it-and-forget-it documentation is over. Context isn’t a document. It’s a living system. And it’s time we built the infrastructure to treat it that way.

Future-proof your data catalog

DataHub transforms enterprise metadata management with AI-powered discovery, intelligent observability, and automated governance.

Explore DataHub Cloud

Take a self-guided product tour to see DataHub Cloud in action.

Join the DataHub open source community 

Join our 14,000+ community members to collaborate with the data practitioners who are shaping the future of data and AI.

FAQs

Context engineering is the discipline of curating what goes into a model’s attention budget at inference time: which tokens make it into the window, how tools are described, how retrieval is structured. Continuous context sits one layer beneath. It’s about whether the source material that feeds that curation is still accurate. The two are halves of the same problem. Context engineering optimizes the curation step, and continuous context maintains the substrate that step draws from.

Because the world the documentation describes is changing continuously. Schemas evolve, ownership transfers, business definitions shift, columns get deprecated, and pipelines get refactored. A document is a snapshot in time, but reality is a movie. Humans tolerate this kind of decay because they pattern-match and fill in gaps from experience. AI agents don’t. They confidently operate on whatever the doc says, which means stale documentation in an agentic system produces confidently wrong outputs rather than obvious failures.

Declared context is what humans know and write down: business rules, domain definitions, and organizational conventions. Derived context is summarized from fast-changing technical state like schemas, lineage, and quality scores, and can be computed automatically. Observed context is what nobody wrote down but the system can infer from actual behavior, like which tables are queried together. All three types need to coexist in the same system with different management cadences, because each has its own origin and its own staleness pattern.

A context graph is a structured representation of the entities an AI agent might need to reason about (datasets, pipelines, owners, policies, models, features, business definitions) along with the relationships between them. It’s the substrate continuous context runs on, and it’s what lets a system serve the relevant context for a specific agent task instead of dumping everything into the window. The reason a graph is the right primitive (rather than a document store) is that maintenance is fundamentally a dependency problem. When something changes, the system has to know what depends on it, and only a graph can answer that question efficiently.

No, at least not at any meaningful scale. Manual processes haven’t survived contact with competing priorities and team turnover in any analogous discipline, and there’s no reason to expect this one to be different. Even more sophisticated manual approaches like git hooks, PR checks, and doc-freshness linting only cover what happens in code repositories. They miss the majority of context that matters: metadata changes, pipeline behavior, access control updates, and business logic encoded outside of code. Continuous context requires automated change detection across the full stack.

Continuous context is the same kind of inflection point for context quality that data observability was for data quality five years ago. Data quality used to be a manual process: someone notices bad data, files a ticket, an engineer investigates. Then tools like Great Expectations, dbt tests, Monte Carlo, and DataHub Observe made data quality observable, measurable, and automated. Context quality is currently in the manual phase, and the path forward looks structurally similar: automated drift detection, impact analysis, freshness monitoring, and self-healing pipelines that keep context current.

Context pollution happens when stale, irrelevant, or contradictory information ends up in an agent’s context window and degrades the quality of its output. The usual framing treats this as a curation problem (filter the inputs better), but most of the worst pollution comes from context that was accurate when it was written and has since gone stale. Continuous context attacks pollution at the source: by detecting drift, flagging stale declarations, and serving freshness signals alongside the context itself, it reduces the chance that out-of-date material gets pulled into a window in the first place.