Tokens — What They Mean
Tokens
TL;DR: Tokens are the units LLMs use to process text — roughly 3/4 of a word in English. They determine both what the model can handle in one request and how much you pay.
Simple Explanation
LLMs don’t read words like humans do. They break text into smaller pieces called tokens:
- “Hello” = 1 token
- “Artificial intelligence” = 2 tokens
- “Antidisestablishmentarianism” = 6 tokens
Rule of thumb for English: 1 token ≈ 4 characters ≈ 0.75 words
So 1,000 tokens ≈ 750 words ≈ about 1.5 pages of text.
Why It Matters for Business
Tokens affect two critical things:
1. Context Window (What Fits)
Every model has a maximum token limit for input + output combined:
| Model | Context Window | Roughly |
|---|---|---|
| GPT-4o | 128K tokens | ~96K words |
| Claude Opus 4.5 | 200K tokens | ~150K words |
| Gemini 1.5 Pro | 1M tokens | ~750K words |
If your prompt + expected response exceeds this, the request fails or gets truncated.
2. Pricing (What You Pay)
API pricing is per token (usually per million):
| Example | Input Cost | Output Cost |
|---|---|---|
| GPT-4o | $2.50/1M | $10/1M |
| Claude Sonnet | $3/1M | $15/1M |
Output tokens cost more because generating text is computationally harder than reading it.
Real-World Example
A customer support bot processes a conversation:
- Customer message: 50 tokens
- Chat history: 500 tokens
- System prompt: 200 tokens
- Total input: 750 tokens
The bot generates a 100-token response.
Cost calculation (at $3/$15 per million):
- Input: 750 × $0.000003 = $0.00225
- Output: 100 × $0.000015 = $0.0015
- Total: $0.00375 per conversation
At 10,000 conversations/month = $37.50
Token Optimization Tips
- Shorter system prompts — Every token counts, especially at scale
- Summarize history — Don’t send full conversation; compress older context
- Choose the right model — Use cheaper models for simple tasks
- Cache when possible — Some providers offer prompt caching discounts
Common Misconceptions
-
Myth: Tokens are the same as words
-
Reality: Common words = 1 token; complex/rare words = multiple tokens
-
Myth: All languages tokenize equally
-
Reality: Non-English languages often use 2-3x more tokens for the same content
Tokenizer Differences
Different models tokenize differently:
- OpenAI uses “tiktoken”
- Anthropic uses their own tokenizer
- The same text may be slightly different token counts across models
Use official tokenizer tools to estimate costs accurately.
Key Takeaways
- Tokens ≈ 0.75 words in English
- They limit context window AND determine cost
- Output tokens cost more than input tokens
- Non-English text uses more tokens
- Optimize tokens at scale for significant savings
Related Concepts
- glossary/llm — The models that use tokens
- glossary/prompt-engineering — Writing efficient prompts
- glossary/rag — Managing context windows with retrieval
Sources
- OpenAI Tokenizer documentation
- Anthropic Claude pricing guides