Skip to content

Claude Managed Agents — Anthropic's Agent Infrastructure

Claude Managed Agents

TL;DR: Claude Managed Agents is Anthropic’s ready-made infrastructure for running AI agents. Instead of building tool orchestration, sandboxes, and error handling yourself, you describe the agent (model, prompt, tools) and Anthropic runs it in their managed cloud. May 2026 update: Dreaming (between-session memory consolidation, research preview), Outcomes (rubric-based success, now public beta), multi-agent orchestration (now public beta), Cowork enterprise features (role-based access controls + group spend limits), 10 finance agent templates, 20+ legal MCP connectors, agent view in Claude Code for multi-session CLI management.

What Problem It Solves

Building agents on the Messages API requires months of engineering:

  • Tool orchestration logic
  • Context management
  • Secure sandboxes
  • Error handling and recovery
  • Secret storage
  • Long-running session management

Managed Agents provides all of this out of the box.

Two key benefits:

  1. Scaffolding stays current — Any custom scaffolding bakes in assumptions about what Claude can’t do. These assumptions become outdated with every model update. Managed Agents updates automatically.

  2. Long-running tasks — Anthropic expects future Claude versions to work for days, weeks, or months. This requires fault-tolerant, secure, scalable infrastructure that’s hard to build yourself.

Anthropic’s task-horizon thesis

The product is positioned around an explicit thesis: “task horizons are growing exponentially — on the METR benchmark, Claude already exceeds 10 human-hours of work.” The strategic bet is that future Claude versions will run sessions of days, weeks, or months on the most complex tasks. Managed Agents is the infrastructure designed to make this viable without forcing every customer to rebuild the same fault-tolerance, recovery, and state-management plumbing.

Implication for evaluating the product: the question isn’t “do I need an agent for a 10-minute task today?” — most teams don’t. The question is: if you commit to agentic work, do you want to maintain the multi-day session infrastructure yourself when the task durations grow? See questions/managed-agents-break-even for the cost framing.

Four Key Concepts

ConceptWhat It Is
AgentVersioned configuration: model, system prompt, tools, MCP servers. Create once, reference by ID.
EnvironmentContainer template: sandbox type, network rules, pre-installed packages.
SessionRunning instance of agent inside environment. Stores conversation, filesystem, status. Can run for hours.
EventsMessage exchange via Server-Sent Events (SSE). You send user messages, agent streams responses and tool calls.

How They Connect

Agent (configuration)
Environment (container template)
Session (running instance)
Events (message stream)

Quick Start (10 Minutes)

Step 1: Install

Terminal window
# CLI (macOS)
brew install anthropics/tap/ant
# Python SDK
pip install anthropic

Step 2: Create Agent

from anthropic import Anthropic
client = Anthropic()
agent = client.beta.agents.create(
name="Coding Assistant",
model="claude-sonnet-4-6",
system="You are a helpful coding assistant.",
tools=[{"type": "agent_toolset_20260401"}],
)

agent_toolset_20260401 includes all built-in tools: bash, file operations, web search.

Step 3: Create Environment

environment = client.beta.environments.create(
name="dev-env",
config={
"type": "cloud",
"networking": {"type": "unrestricted"},
"packages": {"pip": ["pandas", "numpy"]} # optional
},
)

Step 4: Start Session and Send Task

session = client.beta.sessions.create(
agent=agent.id,
environment_id=environment.id,
)
with client.beta.sessions.events.stream(session.id) as stream:
client.beta.sessions.events.send(
session.id,
events=[{
"type": "user.message",
"content": [{"type": "text", "text": "Create a Python script..."}],
}],
)
for event in stream:
match event.type:
case "agent.message":
for block in event.content:
print(block.text, end="")
case "agent.tool_use":
print(f"\n[Tool: {event.name}]")
case "session.status_idle":
print("\n\nDone.")
break

What happens under the hood:

  1. Container deploys from environment template
  2. Claude decides which tools to use
  3. Tool calls execute inside container
  4. Results stream to you in real-time
  5. session.status_idle when task is done

Built-in Tools

ToolDescription
bashExecute shell commands in container
readRead files
writeWrite files
editReplace strings in files
globFind files by pattern
grepSearch text by regex
web_fetchDownload content by URL
web_searchInternet search

Tool Configuration

Disable specific tools:

{
"type": "agent_toolset_20260401",
"configs": [
{"name": "web_fetch", "enabled": false},
{"name": "web_search", "enabled": false}
]
}

Enable only specific tools:

{
"type": "agent_toolset_20260401",
"default_config": {"enabled": false},
"configs": [
{"name": "bash", "enabled": true},
{"name": "read", "enabled": true}
]
}

Custom Tools

Define your own tools with structured input schemas:

agent = client.beta.agents.create(
name="Weather Agent",
model="claude-sonnet-4-6",
tools=[
{"type": "agent_toolset_20260401"},
{
"type": "custom",
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"],
},
},
],
)

Best practices for custom tools:

  • Write detailed descriptions (3-4 sentences): what it does, when to use, limitations
  • Combine related operations with an action parameter
  • Use namespaces in names (db_query, storage_read)
  • Return only essential info — stable identifiers, not internal references

Permission System

Two modes for tool execution:

ModeBehaviorUse Case
always_allowTools execute automaticallyTrusted internal agents
always_askSession pauses for approvalUser-facing agents

Modes can combine: file reading automatic, bash commands need approval.

This is more production-ready than most open-source frameworks (LangGraph, CrewAI, AutoGen) — none provide per-tool permissions out of the box.

Usage Patterns

Event-triggered

External service triggers agent. Bug detected → agent writes patch and opens PR. Example: Sentry integration

Scheduled

Agent runs on schedule. Daily digests: GitHub activity, team tasks, X (Twitter) summary.

Fire-and-forget

Human sets task via Slack → gets result: table, presentation, app. Example: Asana AI Teammates

Long-horizon

Tasks running for hours. Research projects, large-scale code migrations, deep analysis.

CLI for setup, SDK for runtime

Agent templates stored as YAML in git. CLI applies them in deploy pipeline. SDK manages sessions at runtime.

Outcomes (Public Beta as of May 2026)

Outcomes turn sessions from conversations into goal-oriented work:

client.beta.sessions.events.send(
session_id=session.id,
events=[{
"type": "user.define_outcome",
"description": "Build a DCF model for Costco in .xlsx",
"rubric": {"type": "text", "content": RUBRIC},
"max_iterations": 5, # default 3, max 20
}],
)

A separate grader evaluates whether criteria are met. Agent iterates until satisfied or max iterations reached.

Good rubric criteria:

  • ✅ “CSV contains price column with numeric values”
  • ❌ “Data looks good”

Multi-Agent Orchestration (Public Beta as of May 2026)

One coordinator can delegate to other agents:

orchestrator = client.beta.agents.create(
name="Engineering Lead",
model="claude-sonnet-4-6",
system="Delegate code review to reviewer, tests to test agent.",
tools=[{"type": "agent_toolset_20260401"}],
callable_agents=[
{"type": "agent", "id": reviewer_agent.id, "version": reviewer_agent.version},
{"type": "agent", "id": test_writer_agent.id, "version": test_writer_agent.version},
],
)

Use cases:

  • Code review (separate agent with read-only tools)
  • Test generation (writes tests, doesn’t touch production code)
  • Research (agent with web tools collects information)

Limitation: Only one level of delegation.

Dreaming — between-session memory consolidation (Research Preview, May 2026)

The most architecturally interesting feature shipped in the May 2026 wave. Dreaming extends Claude’s memory capabilities by reviewing past sessions to find patterns and help agents self-improve. The mechanism: the agent runs a “dreaming” pass over recent sessions, identifies patterns (recurring user preferences, frequently-corrected mistakes, successful task templates), and consolidates them into its memory.

Two operating modes:

  • Automatic — Dreaming updates memory without per-update review. Faster; appropriate for low-stakes / high-volume agents.
  • Review-before-landing — Each Dreaming-proposed memory update is surfaced for human approval before persistence. Slower; appropriate for high-stakes agents.

Why this matters: Dreaming is the production realization of the semantic + procedural memory layers described in glossary/agentic-memory. The glossary entry (created May 15, 2026) described memory as a 4-layer architecture (working / episodic / semantic / procedural) that has to be engineered on top of the model. Dreaming is a piece of that engineering at the platform level — the consolidation pass that turns episodic memory (raw session history) into procedural memory (reusable patterns).

Connection to the wiki cluster: Dreaming shipped 8 days after the agentic-memory glossary entry. Anthropic’s mechanism aligns directly with the academic distinction between episodic and procedural memory consolidation that the entry borrows from cognitive science. The cluster prediction (memory must be engineered, not built-in) is being validated at the production-platform layer.

Practical impact: for strategist-pattern-style deployments where the agent operates across many sessions over weeks or months, Dreaming removes a substantial chunk of the manual memory-curation work the practitioner would otherwise own. The trade-off (per the glossary/agentic-memory honest-limits section) is that Dreaming introduces a new failure mode — a wrong pattern consolidated into memory gets retrieved and reused with confidence. Review-before-landing mode addresses this; automatic mode is the cost.

Cowork enterprise features (April–May 2026)

Anthropic added enterprise-tier controls to Claude Cowork (the file-based persistent-memory pattern that pairs with Skills):

  • Role-based access controls (RBAC). Admins on Claude Enterprise can organize users into groups and assign custom roles defining which Claude capabilities members can use. Cowork can be turned on for specific teams.
  • Group spend limits. Per-team budgets set from the admin console — predictable costs at organization scale.
  • Agent view in Claude Code. A new way to manage multiple agent sessions from one CLI view: start agents, send them to the background, peek at status and last responses, jump back into sessions only when input is needed.

These are operational primitives for organizations deploying Claude across many teams. The RBAC + spend-limit combination is what makes Claude Enterprise-grade rather than a heavy-power-user tool.

Vertical agent templates (April–May 2026)

Anthropic shipped two waves of vertical-specific agent templates:

  • 10 finance agent templates — pitchbook generation, KYC screening, earnings review, month-end close, and other CFO/IB-team-relevant workflows.
  • 20+ legal MCP connectors + 12 practice-area plugins — covering legal research, contracts, discovery, matter management, and legal aid. Law-firm-specific deployment surface.

The strategic move: Anthropic is verticalizing the agent surface. Generic agents are the foundation; vertical templates are the on-ramp for specific industry deployments. Expect similar template waves for healthcare, retail, and engineering over 2026.

For glossary/agent-engineering practitioners: the templates accelerate time-to-first-deployment but lock decisions about agent architecture into Anthropic’s defaults. Treat templates as starting points to customize, not finished products to ship as-is.

Architecture

Anthropic designed three independent components:

ComponentRole
”Brain”Claude and scaffolding (agent loop, tool selection)
“Hands”Sandboxes and tools that execute actions
”Session”Event journal

Each component can fail or be replaced independently. Built-in optimizations: prompt caching, context compression, automatic recovery.

Pricing

  • Standard Claude API token rates
  • + $0.08 per hour of active session

A 10-minute coding session costs a few cents for compute.

Who’s Using It

CompanyUse Case
NotionAgents for parallel task execution
RakutenCorporate agents per department (launched in <1 week each)
AsanaAI Teammates working alongside humans
SentryDebugger finds bug → agent writes patch → opens PR
VibecodeDefault infrastructure

Limits

OperationLimit
Resource creation (agents, sessions, environments)60 requests/min
Reads (get, list, streaming)600 requests/min

Access

  • Status: Open beta
  • Header required: managed-agents-2026-04-01 (SDK sets automatically)
  • Outcomes/Multi-agent/Memory: Research preview (request access separately)

When to Use Managed Agents vs. Messages API

Choose Managed AgentsChoose Messages API
Long-running tasks (hours)Simple chat completions
Need code execution sandboxFull control over orchestration
Quick launch priorityCustom scaffolding requirements
Async background workReal-time low-latency needs

Key Takeaways

  • Ready-made infrastructure — no Docker, orchestration code, or tool execution to build
  • Four concepts: Agent (config) → Environment (container) → Session (instance) → Events (stream)
  • Built-in tools for common operations + custom tool support
  • Per-tool permissions for production safety
  • Outcomes turn conversations into goal-oriented work
  • Multi-agent coordination for complex workflows
  • $0.08/hour + token costs
  • glossary/ai-agent — What agents are
  • glossary/agent-outcomes — Goal-oriented agent work (Outcomes feature)
  • automation/ai-agent-organization — 12 techniques for reliable agents
  • automation/multi-agent-patterns — Dispatcher + deep worker patterns
  • comparisons/managed-agents-vs-diy — When to use Managed vs. build your own
  • questions/managed-agents-break-even — Cost analysis exploration (now incorporates jagged-frontier reliability + task-horizon thesis + advisor-strategy economics)
  • glossary/advisor-strategy — Pair a cheap executor with an Opus advisor. Stacks with the managed-agents toolset for cost optimization
  • glossary/jagged-frontier — Reliability framing: agent success is task-dependent in ways the frontier predicts
  • glossary/agent-adoption-frictions — Why technically capable agents still don’t get used. The Goldilocks-autonomy finding bears directly on Managed-Agents UX (propose-and-approve > pure execution)
  • glossary/agent-engineering — Karpathy’s framing of the discipline; Managed Agents is the managed-infrastructure approach to acquiring it
  • glossary/tool-use — The basic primitive Managed Agents provides infrastructure for; tools are how agents reach outside the model
  • glossary/guardrails — Sandboxing, permission scoping, and budget limits are guardrail categories Managed Agents handles natively
  • glossary/agentic-memory — Managed Agents provides memory primitives; the architecture decisions (what to store, how to retrieve) remain the practitioner’s. Dreaming (May 2026) is the platform-level consolidation pass that turns episodic memory into procedural memory
  • glossary/prompt-caching — Anthropic’s cache_control mechanics underlie Managed Agents caching. The February 2026 default TTL change (60-min → 5-min) affected Managed-Agent workloads disproportionately, inflating costs 30-60% for many production deployments without code changes
  • glossary/agent-payment-protocols — The infrastructure layer for agent-to-agent commerce (AP2, x402, UCP, Visa TAP). Managed Agents currently uses Anthropic’s own infrastructure; payment-protocol support is one of the differentiators when comparing managed-platform agent runtimes (AWS Bedrock AgentCore Payments has native x402)
  • glossary/llm — The underlying technology

Sources

  • Anthropic Managed Agents documentation (April 2026)
  • Telegram @prompt_design