Embeddings — How AI Turns Meaning Into Numbers
Embeddings
TL;DR: Embeddings are numerical representations of text (or images, audio, video) — typically vectors of 256 to 3,072 numbers — that preserve semantic similarity. Two pieces of text about the same topic have similar embedding vectors even when they use different words. This is what makes semantic search work, what makes glossary/rag possible, and what underlies most AI applications that need to match meaning rather than match keywords. Embeddings are infrastructure: most business users will never touch them directly, but they’re the layer that makes AI applications useful.
Simple explanation
Computers don’t understand words. They understand numbers. Embeddings are how we turn the meaning of words into numbers — specifically, lists of numbers (called vectors) that capture what the text is about.
Two pieces of text that mean similar things get embeddings that are close in the vector space. Two pieces of text that mean different things get embeddings that are far apart.
The cleverness: embeddings preserve semantic similarity even when the surface words are different. “How do I reduce customer churn?” and “What are best practices for retention?” use almost no overlapping words, but their embeddings are very close — because they’re about the same thing.
This is what makes modern AI applications work. Search that understands questions. Recommendations based on what a piece of content is actually about. AI assistants that can find relevant information even when you don’t know the exact terminology.
Why it matters for business
Embeddings are the layer that makes AI applications useful for non-keyword search and matching. Three operational consequences:
- Semantic search beats keyword search. A customer asking “my login is broken” should find the help-doc titled “troubleshooting authentication issues.” Keyword search misses this; embedding-based search catches it because the embeddings are close.
- Retrieval-augmented generation depends on embeddings. RAG works by embedding both the query and the document corpus, then retrieving the documents whose embeddings are closest to the query’s. Without embeddings, RAG would have to use keyword matching, which fails on most real questions.
- Content recommendations and clustering. Two articles about the same topic from different authors have similar embeddings; embeddings can drive recommendation systems and content clustering even without explicit categorization.
The business framing: embeddings are how applications match meaning rather than syntax. Anywhere your application needs to find, recommend, or cluster content “by topic,” embeddings are likely the mechanism — even if the UI doesn’t expose them.
How embeddings work (the mechanics)
The basic flow:
- Text in → embedding model → vector out. A pre-trained model (OpenAI’s
text-embedding-3-large, Anthropic’s via Claude API, open-source likeall-MiniLM-L6-v2, etc.) takes a string of text and produces a fixed-length numerical vector. - The vector lives in a high-dimensional space. Typical sizes are 384, 768, 1024, 1536, or 3072 dimensions depending on the model. Each dimension doesn’t correspond to a human-interpretable feature — they’re learned representations.
- Similarity is measured by cosine distance (typically). Two vectors are “close” if the angle between them is small. Range: 0 (orthogonal, unrelated) to 1 (identical direction, same meaning).
- Vectors get stored in a vector database. Pinecone, Weaviate, Qdrant, Chroma, pgvector (PostgreSQL extension), and others. The database is optimized for fast nearest-neighbor search across millions or billions of vectors.
The full pipeline for semantic search or RAG:
- Indexing time: chunk content into pieces → embed each piece → store vectors + source links in vector DB
- Query time: embed user query → find nearest vectors in DB → return source links of those chunks → pass to LLM as context (in RAG) or display directly (in search)
What embeddings are good at (and what they aren’t)
Good at:
- Matching topical similarity even with different vocabulary
- Cross-language matching (multilingual embedding models work across languages)
- Finding related documents in a large corpus
- Clustering content by latent topic
- Detecting near-duplicates and paraphrases
Not good at:
- Exact-match retrieval (find documents containing the literal phrase “Q3 2024 revenue”) — keyword search is better here
- Compositional logic (“documents that discuss A but NOT B”) — embeddings struggle with negation and complex Boolean
- Recency or recency-weighted search by default — embeddings are about meaning, not time; freshness has to be layered on
- Very short text (single words, very short queries) — embeddings work better with phrase-or-longer context
- Numerical or quantitative comparison — embedding “$1M” and embedding “$10M” doesn’t preserve the magnitude relationship usefully
In practice, production systems use hybrid retrieval — keyword search (BM25 or similar) + semantic embeddings + reranking. The hybrid approach catches what either alone would miss.
Embeddings in production (the 2026 landscape)
Major embedding models (as of mid-2026):
- OpenAI —
text-embedding-3-large(3072 dims),text-embedding-3-small(1536 dims, cheaper) - Anthropic — via Voyage AI partnership (
voyage-3-large) - Google —
text-embedding-005(Gemini family) - Open-source —
nomic-embed-text-v2,bge-large-en-v1.5,all-mpnet-base-v2. Free, run locally; quality has closed substantially with hosted models in 2024-2026. - Domain-specific — financial embeddings (Bloomberg GPT lineage), legal embeddings, medical embeddings. Outperform general-purpose embeddings within their domain.
Vector databases: Pinecone (managed), Weaviate (managed/self-hosted), Qdrant (open-source), Chroma (lightweight), pgvector (PostgreSQL extension — increasingly the practical default for small-to-medium deployments because it removes a separate-database operational burden).
Multimodal embeddings (text + image + video in the same space) became production-ready in 2025–2026 — CLIP-style and successor models let you embed an image and a text query in the same space, enabling cross-modal search and recommendation.
Connection to wiki frameworks
- glossary/rag — The most common application of embeddings. RAG is essentially “embed the question, find nearby document chunks, pass them to the LLM as context.”
- glossary/agentic-memory — Semantic memory (the long-term-knowledge layer) is typically embedding-backed. Vector stores are the substrate for “remember everything the agent has learned about this domain.”
- glossary/llm-wiki-pattern — Wiki search can be powered by embedding the page content; semantic search lets the agent (or human) find pages by topic without knowing the exact title.
- seo/agentic-search — How AI search engines decide what to surface; embeddings are part of the underlying mechanism.
- glossary/hallucination — RAG (powered by embeddings) is the most common mitigation pattern for hallucination on factual content.
- glossary/llm — The category of model that consumes embedded context; the embedding model itself is a specialized LLM variant.
Honest limits
- Embeddings are model-locked at indexing time. If you index a corpus with OpenAI’s embedding model and want to switch to Anthropic’s, you have to re-embed everything. This makes embedding-model choice a long-term commitment.
- Embedding quality varies by domain. General-purpose embeddings work fine for general content; specialized domains (legal, medical, financial) often need domain-tuned embeddings to match performance.
- Chunking strategy matters. How you split a document before embedding (by paragraph, by section, by sliding window, with overlap) materially affects retrieval quality. There’s no single best chunking strategy.
- Embeddings can encode biases. The model that generated the embeddings carries any biases present in its training data; this propagates into search/retrieval outcomes.
- Cost compounds at scale. Embedding a million documents costs real money; re-embedding when you change models doubles the cost. The cost is usually one-time-per-corpus, but it’s not zero.
- Vector dimensions trade off cost vs. quality. Smaller embeddings (384 dims) are cheaper to store and faster to search; larger embeddings (3072 dims) are more accurate. The right size depends on the use case.
Related
- glossary/rag — The most common application; embeddings + retrieval + LLM
- glossary/agentic-memory — Vector-store-backed semantic memory uses embeddings
- glossary/llm-wiki-pattern — Wiki search powered by embeddings is one of the practical operating patterns
- seo/agentic-search — How AI search engines work; embeddings part of the mechanism
- seo/agentic-search-optimization — Optimizing content for embedding-based retrieval
- glossary/hallucination — RAG-with-embeddings is the most common mitigation
- glossary/llm — The underlying technology; embedding models are LLM variants
Key Takeaways
- Embeddings are numerical representations of text (or images, audio, video) that preserve semantic similarity. Vectors of 256–3072 numbers, depending on the model.
- Two texts about the same topic have similar embeddings even with different vocabulary. This is what makes semantic search and RAG work.
- The pipeline: chunk content → embed chunks → store in vector DB → at query time, embed the query → find nearest vectors → return source links.
- Hybrid retrieval (keyword + embedding + reranking) beats either alone in production systems.
- Good at: topical similarity, cross-language matching, related-document discovery, clustering, paraphrase detection.
- Not good at: exact-phrase match, compositional Boolean logic, recency-weighted retrieval, very short text, quantitative comparison.
- Production landscape: OpenAI, Anthropic/Voyage, Google embeddings, plus open-source options. Vector databases: Pinecone, Weaviate, Qdrant, Chroma, pgvector. Multimodal embeddings (text + image + video) production-ready in 2025–2026.
- Embeddings are model-locked at indexing time — changing models means re-embedding the corpus. Choose carefully.
Sources
- OpenAI Embeddings documentation (text-embedding-3-large, text-embedding-3-small)
- Voyage AI documentation (Anthropic-recommended embedding partner)
- Pinecone, Weaviate, Qdrant, Chroma, pgvector documentation
- MTEB (Massive Text Embedding Benchmark) — leaderboard for comparing embedding models across tasks
- Practitioner consensus from RAG and semantic-search literature, 2023–2026
- Domain-specific embedding research (Bloomberg GPT for finance, Med-PaLM for medical, etc.)