Tokens — What They Mean

Tokens

TL;DR: Tokens are the units LLMs use to process text — roughly 3/4 of a word in English. They determine both what the model can handle in one request and how much you pay.

Simple Explanation

LLMs don’t read words like humans do. They break text into smaller pieces called tokens:

“Hello” = 1 token
“Artificial intelligence” = 2 tokens
“Antidisestablishmentarianism” = 6 tokens

Rule of thumb for English: 1 token ≈ 4 characters ≈ 0.75 words

So 1,000 tokens ≈ 750 words ≈ about 1.5 pages of text.

Why It Matters for Business

Tokens affect two critical things:

1. Context Window (What Fits)

Every model has a maximum token limit for input + output combined:

Model	Context Window	Roughly
GPT-4o	128K tokens	~96K words
Claude Opus 4.5	200K tokens	~150K words
Gemini 1.5 Pro	1M tokens	~750K words

If your prompt + expected response exceeds this, the request fails or gets truncated.

2. Pricing (What You Pay)

API pricing is per token (usually per million):

Example	Input Cost	Output Cost
GPT-4o	$2.50/1M	$10/1M
Claude Sonnet	$3/1M	$15/1M

Output tokens cost more because generating text is computationally harder than reading it.

Real-World Example

A customer support bot processes a conversation:

Customer message: 50 tokens
Chat history: 500 tokens
System prompt: 200 tokens
Total input: 750 tokens

The bot generates a 100-token response.

Cost calculation (at $3/$15 per million):

Input: 750 × $0.000003 = $0.00225
Output: 100 × $0.000015 = $0.0015
Total: $0.00375 per conversation

At 10,000 conversations/month = $37.50

Token Optimization Tips

Shorter system prompts — Every token counts, especially at scale
Summarize history — Don’t send full conversation; compress older context
Choose the right model — Use cheaper models for simple tasks
Cache when possible — Some providers offer prompt caching discounts

Common Misconceptions

Myth: Tokens are the same as words
Reality: Common words = 1 token; complex/rare words = multiple tokens
Myth: All languages tokenize equally
Reality: Non-English languages often use 2-3x more tokens for the same content

Tokenizer Differences

Different models tokenize differently:

OpenAI uses “tiktoken”
Anthropic uses their own tokenizer
The same text may be slightly different token counts across models

Use official tokenizer tools to estimate costs accurately.

Key Takeaways

Tokens ≈ 0.75 words in English
They limit context window AND determine cost
Output tokens cost more than input tokens
Non-English text uses more tokens
Optimize tokens at scale for significant savings

glossary/llm — The models that use tokens
glossary/prompt-engineering — Writing efficient prompts
glossary/rag — Managing context windows with retrieval

Sources

OpenAI Tokenizer documentation
Anthropic Claude pricing guides