Skip to content

Select theme

Welcome
Marketing
SEO
Competitor analysis
- The Empty Paid-Social Lane in DNA-Personalized Beauty (2026 Market Note)
- Competitor Analysis in 2026 — The Operational Approach
Automation
Tools
Glossary
Comparisons
Cases
Experiments
Questions

Select theme

On this page

Overview

On this page

Overview

Pages tagged "cost-optimization"

3 pages tagged with cost-optimization. ← all tags

Prompt Caching — The Production Cost-Optimization Layer for LLM Applications Prompt caching reuses LLM input tokens across requests, cutting input-token costs by up to 90% (Anthropic cache reads are 10% of base price; OpenAI cached inputs run 75-90% cheaper). Combined caching strategies achieve 70-80% total cost reduction in production. The 2026 production landscape: Anthropic cache_control markers with 5-min default TTL (1-hour extended), OpenAI automatic prompt caching, semantic caching via vector similarity (Redis, GPTCache). Distinct from KV caching (model-internal) and agentic memory (cross-session persistence).
Advisor Strategy — Pairing a Smarter Model as an Occasional Advisor With a Cheaper Executor Anthropic's advisor pattern (April 2026): the executor model (Sonnet or Haiku) handles tasks end-to-end while consulting an advisor model (Opus) only on hard decisions. Server-side, single API request. Sonnet+Opus advisor: +2.7pp on SWE-bench at -11.9% cost. Haiku+Opus: 41.2% on BrowseComp vs 19.7% solo, 85% cheaper than Sonnet alone.
Advisor Strategy — Smart Model Pairing for Cost-Efficiency A pattern where a cheap executor model consults an expensive advisor only when facing hard decisions, reducing costs while improving performance