Claude Managed Agents — Anthropic's Agent Infrastructure

Claude Managed Agents

TL;DR: Claude Managed Agents is Anthropic’s ready-made infrastructure for running AI agents. Instead of building tool orchestration, sandboxes, and error handling yourself, you describe the agent (model, prompt, tools) and Anthropic runs it in their managed cloud. May 2026 update: Dreaming (between-session memory consolidation, research preview), Outcomes (rubric-based success, now public beta), multi-agent orchestration (now public beta), Cowork enterprise features (role-based access controls + group spend limits), 10 finance agent templates, 20+ legal MCP connectors, agent view in Claude Code for multi-session CLI management.

What Problem It Solves

Building agents on the Messages API requires months of engineering:

Tool orchestration logic
Context management
Secure sandboxes
Error handling and recovery
Secret storage
Long-running session management

Managed Agents provides all of this out of the box.

Two key benefits:

Scaffolding stays current — Any custom scaffolding bakes in assumptions about what Claude can’t do. These assumptions become outdated with every model update. Managed Agents updates automatically.
Long-running tasks — Anthropic expects future Claude versions to work for days, weeks, or months. This requires fault-tolerant, secure, scalable infrastructure that’s hard to build yourself.

Anthropic’s task-horizon thesis

The product is positioned around an explicit thesis: “task horizons are growing exponentially — on the METR benchmark, Claude already exceeds 10 human-hours of work.” The strategic bet is that future Claude versions will run sessions of days, weeks, or months on the most complex tasks. Managed Agents is the infrastructure designed to make this viable without forcing every customer to rebuild the same fault-tolerance, recovery, and state-management plumbing.

Implication for evaluating the product: the question isn’t “do I need an agent for a 10-minute task today?” — most teams don’t. The question is: if you commit to agentic work, do you want to maintain the multi-day session infrastructure yourself when the task durations grow? See questions/managed-agents-break-even for the cost framing.

Four Key Concepts

Concept	What It Is
Agent	Versioned configuration: model, system prompt, tools, MCP servers. Create once, reference by ID.
Environment	Container template: sandbox type, network rules, pre-installed packages.
Session	Running instance of agent inside environment. Stores conversation, filesystem, status. Can run for hours.
Events	Message exchange via Server-Sent Events (SSE). You send user messages, agent streams responses and tool calls.

How They Connect

Agent (configuration)
    ↓
Environment (container template)
    ↓
Session (running instance)
    ↓
Events (message stream)

Quick Start (10 Minutes)

Step 1: Install

# CLI (macOS)
brew install anthropics/tap/ant

# Python SDK
pip install anthropic

Step 2: Create Agent

from anthropic import Anthropic

client = Anthropic()

agent = client.beta.agents.create(
    name="Coding Assistant",
    model="claude-sonnet-4-6",
    system="You are a helpful coding assistant.",
    tools=[{"type": "agent_toolset_20260401"}],
)

agent_toolset_20260401 includes all built-in tools: bash, file operations, web search.

Step 3: Create Environment

environment = client.beta.environments.create(
    name="dev-env",
    config={
        "type": "cloud",
        "networking": {"type": "unrestricted"},
        "packages": {"pip": ["pandas", "numpy"]}  # optional
    },
)

Step 4: Start Session and Send Task

session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
)

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(
        session.id,
        events=[{
            "type": "user.message",
            "content": [{"type": "text", "text": "Create a Python script..."}],
        }],
    )

    for event in stream:
        match event.type:
            case "agent.message":
                for block in event.content:
                    print(block.text, end="")
            case "agent.tool_use":
                print(f"\n[Tool: {event.name}]")
            case "session.status_idle":
                print("\n\nDone.")
                break

What happens under the hood:

Container deploys from environment template
Claude decides which tools to use
Tool calls execute inside container
Results stream to you in real-time
session.status_idle when task is done

Built-in Tools

Tool	Description
bash	Execute shell commands in container
read	Read files
write	Write files
edit	Replace strings in files
glob	Find files by pattern
grep	Search text by regex
web_fetch	Download content by URL
web_search	Internet search

Tool Configuration

Disable specific tools:

{
  "type": "agent_toolset_20260401",
  "configs": [
    {"name": "web_fetch", "enabled": false},
    {"name": "web_search", "enabled": false}
  ]
}

Enable only specific tools:

{
  "type": "agent_toolset_20260401",
  "default_config": {"enabled": false},
  "configs": [
    {"name": "bash", "enabled": true},
    {"name": "read", "enabled": true}
  ]
}

Custom Tools

Define your own tools with structured input schemas:

agent = client.beta.agents.create(
    name="Weather Agent",
    model="claude-sonnet-4-6",
    tools=[
        {"type": "agent_toolset_20260401"},
        {
            "type": "custom",
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"],
            },
        },
    ],
)

Best practices for custom tools:

Write detailed descriptions (3-4 sentences): what it does, when to use, limitations
Combine related operations with an action parameter
Use namespaces in names (db_query, storage_read)
Return only essential info — stable identifiers, not internal references

Permission System

Two modes for tool execution:

Mode	Behavior	Use Case
always_allow	Tools execute automatically	Trusted internal agents
always_ask	Session pauses for approval	User-facing agents

Modes can combine: file reading automatic, bash commands need approval.

This is more production-ready than most open-source frameworks (LangGraph, CrewAI, AutoGen) — none provide per-tool permissions out of the box.

Usage Patterns

Event-triggered

External service triggers agent. Bug detected → agent writes patch and opens PR. Example: Sentry integration

Scheduled

Agent runs on schedule. Daily digests: GitHub activity, team tasks, X (Twitter) summary.

Fire-and-forget

Human sets task via Slack → gets result: table, presentation, app. Example: Asana AI Teammates

Long-horizon

Tasks running for hours. Research projects, large-scale code migrations, deep analysis.

CLI for setup, SDK for runtime

Agent templates stored as YAML in git. CLI applies them in deploy pipeline. SDK manages sessions at runtime.

Outcomes (Public Beta as of May 2026)

Outcomes turn sessions from conversations into goal-oriented work:

client.beta.sessions.events.send(
    session_id=session.id,
    events=[{
        "type": "user.define_outcome",
        "description": "Build a DCF model for Costco in .xlsx",
        "rubric": {"type": "text", "content": RUBRIC},
        "max_iterations": 5,  # default 3, max 20
    }],
)

A separate grader evaluates whether criteria are met. Agent iterates until satisfied or max iterations reached.

Good rubric criteria:

✅ “CSV contains price column with numeric values”
❌ “Data looks good”

Multi-Agent Orchestration (Public Beta as of May 2026)

One coordinator can delegate to other agents:

orchestrator = client.beta.agents.create(
    name="Engineering Lead",
    model="claude-sonnet-4-6",
    system="Delegate code review to reviewer, tests to test agent.",
    tools=[{"type": "agent_toolset_20260401"}],
    callable_agents=[
        {"type": "agent", "id": reviewer_agent.id, "version": reviewer_agent.version},
        {"type": "agent", "id": test_writer_agent.id, "version": test_writer_agent.version},
    ],
)

Use cases:

Code review (separate agent with read-only tools)
Test generation (writes tests, doesn’t touch production code)
Research (agent with web tools collects information)

Limitation: Only one level of delegation.

Dreaming — between-session memory consolidation (Research Preview, May 2026)

The most architecturally interesting feature shipped in the May 2026 wave. Dreaming extends Claude’s memory capabilities by reviewing past sessions to find patterns and help agents self-improve. The mechanism: the agent runs a “dreaming” pass over recent sessions, identifies patterns (recurring user preferences, frequently-corrected mistakes, successful task templates), and consolidates them into its memory.

Two operating modes:

Automatic — Dreaming updates memory without per-update review. Faster; appropriate for low-stakes / high-volume agents.
Review-before-landing — Each Dreaming-proposed memory update is surfaced for human approval before persistence. Slower; appropriate for high-stakes agents.

Why this matters: Dreaming is the production realization of the semantic + procedural memory layers described in glossary/agentic-memory. The glossary entry (created May 15, 2026) described memory as a 4-layer architecture (working / episodic / semantic / procedural) that has to be engineered on top of the model. Dreaming is a piece of that engineering at the platform level — the consolidation pass that turns episodic memory (raw session history) into procedural memory (reusable patterns).

Connection to the wiki cluster: Dreaming shipped 8 days after the agentic-memory glossary entry. Anthropic’s mechanism aligns directly with the academic distinction between episodic and procedural memory consolidation that the entry borrows from cognitive science. The cluster prediction (memory must be engineered, not built-in) is being validated at the production-platform layer.

Practical impact: for strategist-pattern-style deployments where the agent operates across many sessions over weeks or months, Dreaming removes a substantial chunk of the manual memory-curation work the practitioner would otherwise own. The trade-off (per the glossary/agentic-memory honest-limits section) is that Dreaming introduces a new failure mode — a wrong pattern consolidated into memory gets retrieved and reused with confidence. Review-before-landing mode addresses this; automatic mode is the cost.

Cowork enterprise features (April–May 2026)

Anthropic added enterprise-tier controls to Claude Cowork (the file-based persistent-memory pattern that pairs with Skills):

Role-based access controls (RBAC). Admins on Claude Enterprise can organize users into groups and assign custom roles defining which Claude capabilities members can use. Cowork can be turned on for specific teams.
Group spend limits. Per-team budgets set from the admin console — predictable costs at organization scale.
Agent view in Claude Code. A new way to manage multiple agent sessions from one CLI view: start agents, send them to the background, peek at status and last responses, jump back into sessions only when input is needed.

These are operational primitives for organizations deploying Claude across many teams. The RBAC + spend-limit combination is what makes Claude Enterprise-grade rather than a heavy-power-user tool.

Vertical agent templates (April–May 2026)

Anthropic shipped two waves of vertical-specific agent templates:

10 finance agent templates — pitchbook generation, KYC screening, earnings review, month-end close, and other CFO/IB-team-relevant workflows.
20+ legal MCP connectors + 12 practice-area plugins — covering legal research, contracts, discovery, matter management, and legal aid. Law-firm-specific deployment surface.

The strategic move: Anthropic is verticalizing the agent surface. Generic agents are the foundation; vertical templates are the on-ramp for specific industry deployments. Expect similar template waves for healthcare, retail, and engineering over 2026.

For glossary/agent-engineering practitioners: the templates accelerate time-to-first-deployment but lock decisions about agent architecture into Anthropic’s defaults. Treat templates as starting points to customize, not finished products to ship as-is.

Architecture

Anthropic designed three independent components:

Component	Role
”Brain”	Claude and scaffolding (agent loop, tool selection)
“Hands”	Sandboxes and tools that execute actions
”Session”	Event journal

Each component can fail or be replaced independently. Built-in optimizations: prompt caching, context compression, automatic recovery.

Pricing

Standard Claude API token rates
+ $0.08 per hour of active session

A 10-minute coding session costs a few cents for compute.

Who’s Using It

Company	Use Case
Notion	Agents for parallel task execution
Rakuten	Corporate agents per department (launched in <1 week each)
Asana	AI Teammates working alongside humans
Sentry	Debugger finds bug → agent writes patch → opens PR
Vibecode	Default infrastructure

Limits

Operation	Limit
Resource creation (agents, sessions, environments)	60 requests/min
Reads (get, list, streaming)	600 requests/min

Access

Status: Open beta
Header required: managed-agents-2026-04-01 (SDK sets automatically)
Outcomes/Multi-agent/Memory: Research preview (request access separately)

When to Use Managed Agents vs. Messages API

Choose Managed Agents	Choose Messages API
Long-running tasks (hours)	Simple chat completions
Need code execution sandbox	Full control over orchestration
Quick launch priority	Custom scaffolding requirements
Async background work	Real-time low-latency needs

Key Takeaways

Ready-made infrastructure — no Docker, orchestration code, or tool execution to build
Four concepts: Agent (config) → Environment (container) → Session (instance) → Events (stream)
Built-in tools for common operations + custom tool support
Per-tool permissions for production safety
Outcomes turn conversations into goal-oriented work
Multi-agent coordination for complex workflows
$0.08/hour + token costs

glossary/ai-agent — What agents are
glossary/agent-outcomes — Goal-oriented agent work (Outcomes feature)
automation/ai-agent-organization — 12 techniques for reliable agents
automation/multi-agent-patterns — Dispatcher + deep worker patterns
comparisons/managed-agents-vs-diy — When to use Managed vs. build your own
questions/managed-agents-break-even — Cost analysis exploration (now incorporates jagged-frontier reliability + task-horizon thesis + advisor-strategy economics)
glossary/advisor-strategy — Pair a cheap executor with an Opus advisor. Stacks with the managed-agents toolset for cost optimization
glossary/jagged-frontier — Reliability framing: agent success is task-dependent in ways the frontier predicts
glossary/agent-adoption-frictions — Why technically capable agents still don’t get used. The Goldilocks-autonomy finding bears directly on Managed-Agents UX (propose-and-approve > pure execution)
glossary/agent-engineering — Karpathy’s framing of the discipline; Managed Agents is the managed-infrastructure approach to acquiring it
glossary/tool-use — The basic primitive Managed Agents provides infrastructure for; tools are how agents reach outside the model
glossary/guardrails — Sandboxing, permission scoping, and budget limits are guardrail categories Managed Agents handles natively
glossary/agentic-memory — Managed Agents provides memory primitives; the architecture decisions (what to store, how to retrieve) remain the practitioner’s. Dreaming (May 2026) is the platform-level consolidation pass that turns episodic memory into procedural memory
glossary/prompt-caching — Anthropic’s cache_control mechanics underlie Managed Agents caching. The February 2026 default TTL change (60-min → 5-min) affected Managed-Agent workloads disproportionately, inflating costs 30-60% for many production deployments without code changes
glossary/agent-payment-protocols — The infrastructure layer for agent-to-agent commerce (AP2, x402, UCP, Visa TAP). Managed Agents currently uses Anthropic’s own infrastructure; payment-protocol support is one of the differentiators when comparing managed-platform agent runtimes (AWS Bedrock AgentCore Payments has native x402)
glossary/llm — The underlying technology

Sources

Anthropic Managed Agents documentation (April 2026)
Telegram @prompt_design