Context Engineering — Designing Information Flow for AI Agents

Context Engineering

TL;DR: Context engineering structures how tools present information to AI agents. Unlike prompt engineering (which guides model behavior), context engineering designs tool responses so agents can reason across multiple calls — giving them “peripheral vision” to navigate complex information spaces.

What It Is

Context engineering is:

“Structuring tool responses and information flow to give agents the right data in the right format for effective reasoning.”

It recognizes that agentic systems are persistent — they make multiple sequential tool calls to build understanding. This is fundamentally different from humans making single queries.

Context Engineering vs. Prompt Engineering

Prompt Engineering	Context Engineering
Guides model behavior	Shapes information flow
Optimizes instructions	Optimizes tool responses
Single interaction focus	Multi-turn agent focus
”How should the model act?"	"What should the model see?”

Key insight: Tool responses ARE prompt engineering. The XML structure, metadata, and system instructions in tool outputs directly influence how agents think about subsequent searches.

The Four-Level Framework

Level 1: Minimal Chunks (Baseline RAG)

Raw text without metadata. Agent sees content but has no context about where it came from or what else exists.

Problem: Agent can’t strategically explore because it has no map.

Level 2: Source Metadata

Adds document source, page numbers, and clustering signals.

Enables: Strategic full-page loading. If multiple chunks come from one document, agent can load the whole thing.

Example: “3 chunks from document X” → agent uses load_pages() instead of searching again.

Formats different content types appropriately:

Tables → Markdown or HTML
Images → Include OCR text
Structured data → Preserve structure

Why: Agents reason differently about tabular vs. prose data.

Level 4: Faceted Landscape

Returns complete metadata distributions revealing patterns across all dimensions.

The breakthrough: Shows what the agent didn’t retrieve.

Faceted Search Explained

Facets expose metadata aggregations alongside results:

Facet Type	What It Shows
Source clustering	Which documents contain multiple relevant chunks
Category distributions	Patterns across document types, teams, projects
Coverage mapping	How many results exist beyond top-k ranking

Example: Search returns 3 “Done” tickets, but facets show 5 “Open” tickets exist. The agent now knows there’s hidden relevant information filtered out by similarity ranking.

Tool Response Design Principles

1. Signal-to-Noise Prioritization

Return contextually relevant information, not comprehensive technical details. Agents don’t need everything — they need what matters for the next decision.

2. Peripheral Vision Architecture

Provide metadata about the broader information space beyond top-k results. Let agents see the shape of what they haven’t retrieved.

3. Response-as-Instruction

Structure outputs so metadata aggregations guide subsequent tool calls.

Example system instruction:

“High facet counts for sources with few returned chunks indicate valuable information filtered by similarity ranking — investigate with source filters.”

4. Strategic Filtering

Include parameters aligned with facet dimensions discovered in responses. If facets show categories, tools should accept category filters.

Implementation Example

Before (Level 1):

Result: "The Q4 revenue was $2.3M..."
Result: "Marketing budget increased by..."

After (Level 4):

<search_results total="47" shown="5">
  <facets>
    <source name="Q4-Report.pdf" chunks="12" />
    <source name="Board-Deck.pptx" chunks="8" />
    <category name="Financial" count="23" />
    <category name="Marketing" count="15" />
    <category name="Operations" count="9" />
  </facets>
  <results>
    <result source="Q4-Report.pdf" page="4">
      The Q4 revenue was $2.3M...
    </result>
    ...
  </results>
  <guidance>
    Multiple chunks from Q4-Report.pdf suggest loading full document.
    High count in "Operations" category (9) not shown in top results.
  </guidance>
</search_results>

Now the agent knows:

There’s a concentrated source worth loading fully
There’s a category dimension to explore
The search didn’t show everything relevant

Success Metrics

Teams implementing context engineering report:

Metric	Improvement
Clarification questions	90% reduction
Expert escalations	75% reduction
504 errors	95% reduction
Resolution times	4x improvement

Practical Implementation

Quick Wins (Level 2)

Wrap search results in XML with source tracking
Include page numbers and document paths
Add clustering detection: “Multiple chunks from same source = use load_pages()”
System instructions teaching agents facet-driven refinement

Token Efficiency

Support response format parameters (“concise” vs “detailed”)
Implement pagination with sensible defaults
Truncate verbose responses based on agent reasoning context

Architecture Note

You don’t need to rebuild infrastructure. Most improvements are XML structuring and metadata addition with minimal backend changes.

Key Insight

Agents with sufficient budget demonstrate systematic persistence — they’ll refine searches iteratively based on facet patterns.

This reverses traditional RAG’s “stuff everything relevant upfront” approach. Instead, design tools that reveal the information landscape so agents can systematically explore dimensions.

Key Takeaways

Context engineering shapes what agents see, not how they act
Tool responses ARE prompt engineering
Four levels: Chunks → Metadata → Multi-modal → Faceted landscape
Facets give agents “peripheral vision” of unseen information
4x resolution time improvement with proper implementation

glossary/rag — The baseline that context engineering improves
glossary/llm-evals — Measuring agent performance
glossary/ai-agent — The systems that benefit from context engineering
automation/multi-agent-patterns — Architectural patterns for agents

Sources

Beyond Chunks: Why Context Engineering is the Future of RAG — Jason Liu (August 2025)