Skip to the content.

Embedding


In one sentence

An embedding is a list of numbers — typically a few hundred to a few thousand of them — that represents the meaning of a piece of text in a way that lets you measure how similar two texts are by measuring the distance between their lists.

Why embeddings exist

Computers are good at matching exact words. If you search for “bank loan” on a website that does string matching, you will only find pages that contain the exact phrase “bank loan.” You will miss “credit facility,” “line of credit,” and “borrowing arrangement,” even though they mean the same thing.

For decades, this was the central limitation of search and document retrieval. The cure was a long line of clever tricks — stemming, synonym expansion, hand-built thesauri — that helped a little but never closed the gap.

Embeddings closed it.

A trained embedding model takes a piece of text and produces a vector (a list of numbers) such that texts with similar meanings produce vectors that are close to each other in space. The phrase “bank loan” and the phrase “credit facility” land near each other. “Bank loan” and “river bank” land far apart. The same model can do this across documents, sentences, even single words.

This single innovation underpins almost every modern AI search and retrieval system, including all of RAG.

What it actually does — concretely

"the M5 Max has 128 GB of unified memory"
    │
    ▼   (embedding model)
    │
    ▼
[0.0214, -0.1893, 0.4471, 0.0902, ..., -0.3115]
    └── a vector of, say, 1024 numbers ──┘

That vector is the embedding. By itself it means nothing to a human. But two different vectors can be compared — most commonly using cosine similarity, which essentially measures the angle between the two vectors:

If you embed a thousand documents and store the resulting thousand vectors, you can answer “which of these documents is most relevant to this query?” by embedding the query and finding the closest vectors. This is semantic search.

Working example — what the agent on this machine uses

When the agent on this MacBook calls memory_search("M5 Max benchmark token speed"), what happens under the hood is:

  1. The query string is embedded into a vector using a small, locally-cached embedding model.
  2. Every chunk of MEMORY.md, memory/*.md, and indexed session transcripts has been pre-embedded and stored.
  3. The query vector is compared against the chunk vectors.
  4. The top-scoring chunks are returned with their text and source line numbers.
  5. The agent reads those chunks and decides which to follow up on with memory_get.

This is the same architecture that powers customer-service knowledge bases, legal-document retrieval, internal handbooks, and academic literature search. Different scale, same pattern.

Where embeddings come from

The choice of embedding model matters more than people expect. A bad embedding model will return results that look similar by surface keywords but miss the deeper relationships.

Why embeddings made the modern AI stack possible

Before embeddings became cheap and good (roughly 2019 onward), the dominant pattern for “AI that understands my documents” was:

After embeddings became cheap and good, the dominant pattern became:

This collapsed weeks of bespoke engineering into a single afternoon. It is one of the largest productivity unlocks in software in the last decade and is invisible to most end users because it works.

Why this matters in a teaching context

For BBA and MBA students, embeddings are the technology that turned unstructured text into a queryable asset. Every email archive, every set of meeting notes, every PDF library, every Slack history, every customer-support transcript — all of it became analytically tractable in a way it had not been before.

The strategic implication: organizations that have been accumulating unstructured content for years now own a latent asset they can put to use cheaply. Organizations that have been losing or shredding their unstructured records are going to wish they had not.

A second implication: embeddings are also a privacy concern. A vector built from a sensitive document still encodes the document’s meaning. An attacker who can recover the embeddings (and a few public clues about the embedding model) can reconstruct quite a bit. The right framing: treat your vector store with the same security posture as the source documents themselves.

Trade-offs


Related entries: rag.md, vector-database.md, fine-tuning.md.

Return to Dictionary All Entries (A–Z)