AI Architect Academy

The concept, for engineers

What are embeddings? Vector representations of meaning, explained

Short answer

An embedding is a list of numbers — a vector — that a model assigns to a piece of text (or an image) so that distance in that vector space reflects similarity in meaning. Two inputs that mean nearly the same thing land close together; unrelated inputs land far apart. You produce embeddings by running text through an embedding model, then compare them with a distance metric — usually cosine similarity. That single property, distance equals meaning, is what powers semantic search, clustering, recommendations, and the retrieval step in RAG.

This page is about embeddings the concept. For the retrieval pattern that consumes them, see retrieval-augmented generation; for where the vectors are stored and searched at scale, see the vector database comparison.

What is an embedding?

An embedding is a fixed-length array of floating-point numbers — for example 768 or 1536 of them — that represents one chunk of input. The length is the dimension of the model, and it is constant: every input the model sees becomes a vector of exactly that many numbers, whether the input is a word, a sentence, or a paragraph.

The numbers themselves are not human-readable, and no single dimension means anything you can name. What matters is the geometry of the whole vector relative to other vectors. A "vector embedding" and a "text embedding" are the same thing described from different angles: vector names the data structure, text names the input it encodes. Image and audio embeddings work identically — only the encoder changes.

The point of the transformation is that meaning becomes arithmetic. Once text is a vector, you can measure how related two pieces of text are without matching a single shared keyword. That is the leap over classic lexical search, which only finds documents that contain the words you typed.

How embeddings capture meaning: distance is similarity

A good embedding model places inputs in space so that semantic closeness becomes geometric closeness. "How do I reset my password?" and "I forgot my login details" share almost no words, yet a quality model puts their vectors near each other because they mean the same thing. "How do I bake bread?" lands far away.

To turn that geometry into a number you use a distance (or similarity) metric. The standard choice for text embeddings is cosine similarity — the cosine of the angle between two vectors, which is the dot product of their normalized forms. It ranges from -1 (opposite) through 0 (unrelated) to 1 (identical direction); for typical text embeddings you see values in the 0-to-1 range. Cosine compares orientation, not magnitude, which is why it is the default: it cares that two vectors point the same way, not how long they are. Euclidean (L2) distance and raw dot product are the other common choices, and on normalized vectors they rank results almost identically.

This is the whole trick. Searching becomes: embed the query, then find the stored vectors with the smallest distance to it. Clustering becomes: group vectors that sit near each other. Recommendation becomes: find items whose vectors neighbour the ones a user already liked.

How embeddings are created: embedding models

You do not compute embeddings by hand — you call an embedding model, a neural network trained specifically to map inputs into a useful vector space. It is a different model from the chat model that generates text: an embedding model has no "completion" output, it only emits the vector. Training pushes texts that humans judge as similar to sit close together and dissimilar texts apart, so the learned geometry encodes meaning.

In practice the pipeline is short and mechanical:

  • Chunk — split long documents into passages small enough to embed coherently (a model has a maximum input length, and one vector per huge document blurs meaning).
  • Embed — send each chunk to the model and get back one vector per chunk.
  • Store and index — keep the vectors in a vector store with an index that makes nearest-neighbour search fast.
  • Query — embed incoming queries with the same model and search for the nearest stored vectors.

Two rules are non-negotiable. Query and documents must be embedded by the same model — vectors from different models live in different spaces and are not comparable. And changing the model later means re-embedding your entire corpus, because old and new vectors no longer share a geometry. This site runs embeddings through Cloudflare Workers AI for exactly this retrieval flow.

What embeddings power

Embeddings are infrastructure: a handful of applications all reduce to "compare vectors by distance." The most prominent today is the retrieval step in RAG, but it is far from the only one.

Use caseWhat the vectors doThe operation
Semantic searchMatch a query to documents by meaning, not keywords.Nearest neighbours to the query vector
RAG retrievalPull the passages a model needs to answer from facts.Top-k nearest passages, fed to the model
ClusteringGroup documents by topic with no labels.Vectors that sit near each other
RecommendationsSuggest items similar to ones a user engaged with.Neighbours of liked-item vectors
ClassificationRoute or tag inputs by similarity to known examples.Nearest labelled vector or centroid
DeduplicationFind near-duplicate or paraphrased content.Pairs above a similarity threshold

In retrieval-augmented generation the first two rows do the heavy lifting: embeddings are how the system decides which facts are relevant before the language model ever sees them. In an agentic RAG system the agent itself chooses when to run that search.

Choosing an embedding model

The decision comes down to a few axes: vector dimension (higher can carry more nuance but costs more storage and slower search), language and modality coverage, maximum input length, where it runs (hosted API vs. self-hosted vs. edge), and measured quality. For quality, the public reference is MTEB, the Massive Text Embedding Benchmark — a multi-task leaderboard for comparing models. Treat it as a starting shortlist, then evaluate on your data, because relative ranking shifts by domain.

A few widely used options, to make the trade-offs concrete:

ModelDimensionsNotable trait
OpenAI text-embedding-3-small1536Hosted API; dimensions can be shortened to trade accuracy for size.
OpenAI text-embedding-3-large3072Higher ceiling; also supports dimension reduction.
Cohere embed-v4256 / 512 / 1024 / 1536Multilingual (100+ languages); selectable output dimensions.
Cloudflare Workers AI bge-base-en-v1.5768Runs at the edge alongside your Worker; English.
Cloudflare Workers AI bge-m31024Multilingual, long inputs; open-weights BAAI model.

Heuristics worth holding: start with a strong general model and only specialise if your domain demands it; keep dimensions as low as quality allows, since every extra dimension is storage and latency on every query; and remember that because re-embedding the whole corpus is the cost of switching, the model choice is stickier than it looks. Confirm exact dimensions, input limits, and pricing against each vendor's live docs before you commit — these specifics drift.

Frequently asked questions

What are embeddings?

Embeddings are numeric vectors that a model assigns to text, images, or other data so that distance in the vector space reflects similarity in meaning. Inputs that mean similar things get vectors that sit close together; unrelated inputs sit far apart. That property lets software compare meaning mathematically, which is the basis of semantic search, recommendations, clustering, and RAG retrieval.

What is a vector embedding?

A vector embedding is the same thing as an embedding — the term just emphasises the data structure. It is a fixed-length list of floating-point numbers (for example 768 or 1536 of them) representing one input. The length is the model's dimension and is constant across every input. "Vector" names the structure; "text" or "image" names what was encoded into it.

How do embeddings work?

An embedding model, trained so that similar inputs map to nearby points, converts each input into a vector. You then compare vectors with a distance metric — usually cosine similarity, which measures the angle between them. Small distance means similar meaning, so finding relevant content reduces to finding the nearest vectors to a query vector. Query and documents must be embedded by the same model to be comparable.

What is the difference between embeddings and a vector database?

An embedding is the vector itself — the numeric representation a model produces. A vector database is the storage and search system that holds many embeddings and finds nearest neighbours quickly using an index built for high-dimensional vectors. Embeddings are the data; the vector database is the infrastructure that searches them at scale. See the vector database comparison for the storage side.

What is an embedding model?

An embedding model is a neural network trained specifically to map inputs into a vector space where distance reflects semantic similarity. Unlike a chat model, it produces no text — only the vector. Examples include OpenAI's text-embedding-3 models, Cohere's embed family, and open-weights BGE models available through Cloudflare Workers AI. You call it once per chunk of content and once per query.

How do you choose an embedding model?

Weigh vector dimension (quality vs. storage and search cost), language and modality coverage, maximum input length, where it runs (hosted API, self-hosted, or edge), and measured quality on a benchmark like MTEB. Shortlist from the benchmark, then evaluate on your own data, since rankings shift by domain. Keep dimensions as low as quality allows, and remember that switching models later means re-embedding your entire corpus.

Sources & provenance
  • Embedding-model dimensions and capabilities verified against vendor docs: OpenAI (text-embedding-3-small 1536, -large 3072, shortenable dimensions), Cohere (embed-v4, multilingual, selectable dimensions), and Cloudflare Workers AI BGE models (bge-base-en-v1.5 768, bge-m3 1024) — confirm current values against each vendor's live docs.
  • Cosine similarity as the standard text-embedding metric, and MTEB as the reference benchmark/leaderboard: established embedding literature and the MTEB project (Hugging Face). Specific leaderboard scores change and are intentionally not quoted here.
  • Pipeline and "same model for query and documents" framing synthesized from AI Architect Academy's curriculum (Track B, retrieval) and the platform's own build on Cloudflare Workers AI embeddings.

Model dimensions, limits, and pricing drift; treat figures as a shortlist to verify, not a guaranteed signature. Corrections: hello@aiarch.dev.

Learn embeddings by building a retrieval system that uses them.

AI Architect Academy teaches embeddings, semantic search, and RAG as first-class skills — on a platform that is itself a production retrieval system built across Anthropic, AWS, and Cloudflare. The build is the curriculum.

Free sample — no signup · every claim cited · cancel anytime

Or get notified when new tracks ship.