Claude Managed Agents — Anthropic's Agent Infrastructure
Claude Managed Agents
TL;DR: Claude Managed Agents is Anthropic’s ready-made infrastructure for running AI agents. Instead of building tool orchestration, sandboxes, and error handling yourself, you describe the agent (model, prompt, tools) and Anthropic runs it in their managed cloud. May 2026 update: Dreaming (between-session memory consolidation, research preview), Outcomes (rubric-based success, now public beta), multi-agent orchestration (now public beta), Cowork enterprise features (role-based access controls + group spend limits), 10 finance agent templates, 20+ legal MCP connectors, agent view in Claude Code for multi-session CLI management.
What Problem It Solves
Building agents on the Messages API requires months of engineering:
- Tool orchestration logic
- Context management
- Secure sandboxes
- Error handling and recovery
- Secret storage
- Long-running session management
Managed Agents provides all of this out of the box.
Two key benefits:
-
Scaffolding stays current — Any custom scaffolding bakes in assumptions about what Claude can’t do. These assumptions become outdated with every model update. Managed Agents updates automatically.
-
Long-running tasks — Anthropic expects future Claude versions to work for days, weeks, or months. This requires fault-tolerant, secure, scalable infrastructure that’s hard to build yourself.
Anthropic’s task-horizon thesis
The product is positioned around an explicit thesis: “task horizons are growing exponentially — on the METR benchmark, Claude already exceeds 10 human-hours of work.” The strategic bet is that future Claude versions will run sessions of days, weeks, or months on the most complex tasks. Managed Agents is the infrastructure designed to make this viable without forcing every customer to rebuild the same fault-tolerance, recovery, and state-management plumbing.
Implication for evaluating the product: the question isn’t “do I need an agent for a 10-minute task today?” — most teams don’t. The question is: if you commit to agentic work, do you want to maintain the multi-day session infrastructure yourself when the task durations grow? See questions/managed-agents-break-even for the cost framing.
Four Key Concepts
| Concept | What It Is |
|---|---|
| Agent | Versioned configuration: model, system prompt, tools, MCP servers. Create once, reference by ID. |
| Environment | Container template: sandbox type, network rules, pre-installed packages. |
| Session | Running instance of agent inside environment. Stores conversation, filesystem, status. Can run for hours. |
| Events | Message exchange via Server-Sent Events (SSE). You send user messages, agent streams responses and tool calls. |
How They Connect
Agent (configuration) ↓Environment (container template) ↓Session (running instance) ↓Events (message stream)Quick Start (10 Minutes)
Step 1: Install
# CLI (macOS)brew install anthropics/tap/ant
# Python SDKpip install anthropicStep 2: Create Agent
from anthropic import Anthropic
client = Anthropic()
agent = client.beta.agents.create( name="Coding Assistant", model="claude-sonnet-4-6", system="You are a helpful coding assistant.", tools=[{"type": "agent_toolset_20260401"}],)agent_toolset_20260401 includes all built-in tools: bash, file operations, web search.
Step 3: Create Environment
environment = client.beta.environments.create( name="dev-env", config={ "type": "cloud", "networking": {"type": "unrestricted"}, "packages": {"pip": ["pandas", "numpy"]} # optional },)Step 4: Start Session and Send Task
session = client.beta.sessions.create( agent=agent.id, environment_id=environment.id,)
with client.beta.sessions.events.stream(session.id) as stream: client.beta.sessions.events.send( session.id, events=[{ "type": "user.message", "content": [{"type": "text", "text": "Create a Python script..."}], }], )
for event in stream: match event.type: case "agent.message": for block in event.content: print(block.text, end="") case "agent.tool_use": print(f"\n[Tool: {event.name}]") case "session.status_idle": print("\n\nDone.") breakWhat happens under the hood:
- Container deploys from environment template
- Claude decides which tools to use
- Tool calls execute inside container
- Results stream to you in real-time
session.status_idlewhen task is done
Built-in Tools
| Tool | Description |
|---|---|
| bash | Execute shell commands in container |
| read | Read files |
| write | Write files |
| edit | Replace strings in files |
| glob | Find files by pattern |
| grep | Search text by regex |
| web_fetch | Download content by URL |
| web_search | Internet search |
Tool Configuration
Disable specific tools:
{ "type": "agent_toolset_20260401", "configs": [ {"name": "web_fetch", "enabled": false}, {"name": "web_search", "enabled": false} ]}Enable only specific tools:
{ "type": "agent_toolset_20260401", "default_config": {"enabled": false}, "configs": [ {"name": "bash", "enabled": true}, {"name": "read", "enabled": true} ]}Custom Tools
Define your own tools with structured input schemas:
agent = client.beta.agents.create( name="Weather Agent", model="claude-sonnet-4-6", tools=[ {"type": "agent_toolset_20260401"}, { "type": "custom", "name": "get_weather", "description": "Get current weather for a location", "input_schema": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"], }, }, ],)Best practices for custom tools:
- Write detailed descriptions (3-4 sentences): what it does, when to use, limitations
- Combine related operations with an
actionparameter - Use namespaces in names (
db_query,storage_read) - Return only essential info — stable identifiers, not internal references
Permission System
Two modes for tool execution:
| Mode | Behavior | Use Case |
|---|---|---|
| always_allow | Tools execute automatically | Trusted internal agents |
| always_ask | Session pauses for approval | User-facing agents |
Modes can combine: file reading automatic, bash commands need approval.
This is more production-ready than most open-source frameworks (LangGraph, CrewAI, AutoGen) — none provide per-tool permissions out of the box.
Usage Patterns
Event-triggered
External service triggers agent. Bug detected → agent writes patch and opens PR. Example: Sentry integration
Scheduled
Agent runs on schedule. Daily digests: GitHub activity, team tasks, X (Twitter) summary.
Fire-and-forget
Human sets task via Slack → gets result: table, presentation, app. Example: Asana AI Teammates
Long-horizon
Tasks running for hours. Research projects, large-scale code migrations, deep analysis.
CLI for setup, SDK for runtime
Agent templates stored as YAML in git. CLI applies them in deploy pipeline. SDK manages sessions at runtime.
Outcomes (Public Beta as of May 2026)
Outcomes turn sessions from conversations into goal-oriented work:
client.beta.sessions.events.send( session_id=session.id, events=[{ "type": "user.define_outcome", "description": "Build a DCF model for Costco in .xlsx", "rubric": {"type": "text", "content": RUBRIC}, "max_iterations": 5, # default 3, max 20 }],)A separate grader evaluates whether criteria are met. Agent iterates until satisfied or max iterations reached.
Good rubric criteria:
- ✅ “CSV contains price column with numeric values”
- ❌ “Data looks good”
Multi-Agent Orchestration (Public Beta as of May 2026)
One coordinator can delegate to other agents:
orchestrator = client.beta.agents.create( name="Engineering Lead", model="claude-sonnet-4-6", system="Delegate code review to reviewer, tests to test agent.", tools=[{"type": "agent_toolset_20260401"}], callable_agents=[ {"type": "agent", "id": reviewer_agent.id, "version": reviewer_agent.version}, {"type": "agent", "id": test_writer_agent.id, "version": test_writer_agent.version}, ],)Use cases:
- Code review (separate agent with read-only tools)
- Test generation (writes tests, doesn’t touch production code)
- Research (agent with web tools collects information)
Limitation: Only one level of delegation.
Dreaming — between-session memory consolidation (Research Preview, May 2026)
The most architecturally interesting feature shipped in the May 2026 wave. Dreaming extends Claude’s memory capabilities by reviewing past sessions to find patterns and help agents self-improve. The mechanism: the agent runs a “dreaming” pass over recent sessions, identifies patterns (recurring user preferences, frequently-corrected mistakes, successful task templates), and consolidates them into its memory.
Two operating modes:
- Automatic — Dreaming updates memory without per-update review. Faster; appropriate for low-stakes / high-volume agents.
- Review-before-landing — Each Dreaming-proposed memory update is surfaced for human approval before persistence. Slower; appropriate for high-stakes agents.
Why this matters: Dreaming is the production realization of the semantic + procedural memory layers described in glossary/agentic-memory. The glossary entry (created May 15, 2026) described memory as a 4-layer architecture (working / episodic / semantic / procedural) that has to be engineered on top of the model. Dreaming is a piece of that engineering at the platform level — the consolidation pass that turns episodic memory (raw session history) into procedural memory (reusable patterns).
Connection to the wiki cluster: Dreaming shipped 8 days after the agentic-memory glossary entry. Anthropic’s mechanism aligns directly with the academic distinction between episodic and procedural memory consolidation that the entry borrows from cognitive science. The cluster prediction (memory must be engineered, not built-in) is being validated at the production-platform layer.
Practical impact: for strategist-pattern-style deployments where the agent operates across many sessions over weeks or months, Dreaming removes a substantial chunk of the manual memory-curation work the practitioner would otherwise own. The trade-off (per the glossary/agentic-memory honest-limits section) is that Dreaming introduces a new failure mode — a wrong pattern consolidated into memory gets retrieved and reused with confidence. Review-before-landing mode addresses this; automatic mode is the cost.
Cowork enterprise features (April–May 2026)
Anthropic added enterprise-tier controls to Claude Cowork (the file-based persistent-memory pattern that pairs with Skills):
- Role-based access controls (RBAC). Admins on Claude Enterprise can organize users into groups and assign custom roles defining which Claude capabilities members can use. Cowork can be turned on for specific teams.
- Group spend limits. Per-team budgets set from the admin console — predictable costs at organization scale.
- Agent view in Claude Code. A new way to manage multiple agent sessions from one CLI view: start agents, send them to the background, peek at status and last responses, jump back into sessions only when input is needed.
These are operational primitives for organizations deploying Claude across many teams. The RBAC + spend-limit combination is what makes Claude Enterprise-grade rather than a heavy-power-user tool.
Vertical agent templates (April–May 2026)
Anthropic shipped two waves of vertical-specific agent templates:
- 10 finance agent templates — pitchbook generation, KYC screening, earnings review, month-end close, and other CFO/IB-team-relevant workflows.
- 20+ legal MCP connectors + 12 practice-area plugins — covering legal research, contracts, discovery, matter management, and legal aid. Law-firm-specific deployment surface.
The strategic move: Anthropic is verticalizing the agent surface. Generic agents are the foundation; vertical templates are the on-ramp for specific industry deployments. Expect similar template waves for healthcare, retail, and engineering over 2026.
For glossary/agent-engineering practitioners: the templates accelerate time-to-first-deployment but lock decisions about agent architecture into Anthropic’s defaults. Treat templates as starting points to customize, not finished products to ship as-is.
Architecture
Anthropic designed three independent components:
| Component | Role |
|---|---|
| ”Brain” | Claude and scaffolding (agent loop, tool selection) |
| “Hands” | Sandboxes and tools that execute actions |
| ”Session” | Event journal |
Each component can fail or be replaced independently. Built-in optimizations: prompt caching, context compression, automatic recovery.
Pricing
- Standard Claude API token rates
- + $0.08 per hour of active session
A 10-minute coding session costs a few cents for compute.
Who’s Using It
| Company | Use Case |
|---|---|
| Notion | Agents for parallel task execution |
| Rakuten | Corporate agents per department (launched in <1 week each) |
| Asana | AI Teammates working alongside humans |
| Sentry | Debugger finds bug → agent writes patch → opens PR |
| Vibecode | Default infrastructure |
Limits
| Operation | Limit |
|---|---|
| Resource creation (agents, sessions, environments) | 60 requests/min |
| Reads (get, list, streaming) | 600 requests/min |
Access
- Status: Open beta
- Header required:
managed-agents-2026-04-01(SDK sets automatically) - Outcomes/Multi-agent/Memory: Research preview (request access separately)
When to Use Managed Agents vs. Messages API
| Choose Managed Agents | Choose Messages API |
|---|---|
| Long-running tasks (hours) | Simple chat completions |
| Need code execution sandbox | Full control over orchestration |
| Quick launch priority | Custom scaffolding requirements |
| Async background work | Real-time low-latency needs |
Key Takeaways
- Ready-made infrastructure — no Docker, orchestration code, or tool execution to build
- Four concepts: Agent (config) → Environment (container) → Session (instance) → Events (stream)
- Built-in tools for common operations + custom tool support
- Per-tool permissions for production safety
- Outcomes turn conversations into goal-oriented work
- Multi-agent coordination for complex workflows
- $0.08/hour + token costs
Related
- glossary/ai-agent — What agents are
- glossary/agent-outcomes — Goal-oriented agent work (Outcomes feature)
- automation/ai-agent-organization — 12 techniques for reliable agents
- automation/multi-agent-patterns — Dispatcher + deep worker patterns
- comparisons/managed-agents-vs-diy — When to use Managed vs. build your own
- questions/managed-agents-break-even — Cost analysis exploration (now incorporates jagged-frontier reliability + task-horizon thesis + advisor-strategy economics)
- glossary/advisor-strategy — Pair a cheap executor with an Opus advisor. Stacks with the managed-agents toolset for cost optimization
- glossary/jagged-frontier — Reliability framing: agent success is task-dependent in ways the frontier predicts
- glossary/agent-adoption-frictions — Why technically capable agents still don’t get used. The Goldilocks-autonomy finding bears directly on Managed-Agents UX (propose-and-approve > pure execution)
- glossary/agent-engineering — Karpathy’s framing of the discipline; Managed Agents is the managed-infrastructure approach to acquiring it
- glossary/tool-use — The basic primitive Managed Agents provides infrastructure for; tools are how agents reach outside the model
- glossary/guardrails — Sandboxing, permission scoping, and budget limits are guardrail categories Managed Agents handles natively
- glossary/agentic-memory — Managed Agents provides memory primitives; the architecture decisions (what to store, how to retrieve) remain the practitioner’s. Dreaming (May 2026) is the platform-level consolidation pass that turns episodic memory into procedural memory
- glossary/prompt-caching — Anthropic’s
cache_controlmechanics underlie Managed Agents caching. The February 2026 default TTL change (60-min → 5-min) affected Managed-Agent workloads disproportionately, inflating costs 30-60% for many production deployments without code changes - glossary/agent-payment-protocols — The infrastructure layer for agent-to-agent commerce (AP2, x402, UCP, Visa TAP). Managed Agents currently uses Anthropic’s own infrastructure; payment-protocol support is one of the differentiators when comparing managed-platform agent runtimes (AWS Bedrock AgentCore Payments has native x402)
- glossary/llm — The underlying technology
Sources
- Anthropic Managed Agents documentation (April 2026)
- Telegram @prompt_design