KNNs vs. ANNs: a comprehensive overview detailing KD-trees, HNSWs, and more

Today, we want to compare two popular similarity search algorithms: K-Nearest Neighbors (KNN) and Approximate Nearest Neighbors (ANNs), with the latter denoting a family of algorithms.

What is similarity search? Similarity search is finding the most similar items in a dataset given a query based on distance metrics in vector space. It's a popular field since the advent of LLMs, but it has use cases beyond just AI. For example, similarity search can be used for e-commerce applications that have a "users also liked" modal. With similarity search, applications can retrieve documents, images, and videos as well as detect near-duplicates in a massive dataset. And, of course, similarity search enables LLMs to retrieve contextual information that's relevant to a query.

In this article, we'll be comparing the two mechanisms for performing similarity search: KNN vs ANNs. We'll discuss when it makes sense to use one approach over the other, and what key indicators should tip you off that ANN approach is necessary. Then, we will discuss a decision framework for identifying the appropriate ANN for your use case via guided exploration of candidates. Finally, we'll illustrate how this guided exploration process works for a hypothetical use case: evaluating the suitability of the popular HNSW algorithm, widely-considered the gold standard of graph-based ANNs.

KNNs versus ANNs: A High-Level Comparison

What is K-Nearest Neighbors?

K-Nearest Neighbors, or KNN for short, is exact similarity search. KNN explicitly computes the distances between the query and every item in the dataset, guaranteeing optimal accuracy. However, KNN is extremely expensive, with the cost growing linearly with the dataset size and dimensionality. For massive vectors, like LLM embeddings, KNN is too computationally intensive.

What are Approximate Nearest Neighbors?

Approximate Nearest Neighbors, or ANNs for short, is similarity search with controlled approximation. ANNs are a family of algorithms that use specialized data structures (e.g. trees, hashing, graphs) to prune the search space. They trade off some recall and accuracy for massive gains in speed and scalability.

How do KNNs and ANNs compare?

In a nutshell, KNNs maximize accuracy over computational efficiency. ANNs prioritize efficiency while keeping accuracy "good enough" for most practical applications.

How does KNN work?

There are a few steps that sum up the KNN algorithm:

Represent each item and the query as vectors in high-dimensional space
Define a distance metric, the most common being Euclidean, cosine similarity, and inner product
Compute the distance between the query vector and every vector in the dataset
Sort distances and return the top-k closest neighbors

Specifically, the computational complexity of KNNs is O(n*d) per query (n = dataset size, d = vector dimensionality). Accordingly, linear scaling quickly becomes impractical as n and d grow.

Where does KNN shine?

KNN is ideal for situations where brute-force is feasible: if the dataset is relatively small (e.g. thousands to low millions of vectors) and dimensionality is moderate. There's more tolerance for KNNs when real-time responses aren't necessary and batch processing is acceptable.

Hypothetically, if we had infinite compute, every use case would prefer KNNs. However, because KNNs are expensive, developers need to explore other alternatives whenever brute force is infeasible. This quickly becomes the case when a dataset scales to tens of millions or billions of vectors (or the vectors have hundreds to thousands of dimensions). Tolerance for KNNs quickly fades when queries need to be real-time.

What classes of ANNs exist?

ANNs reference a collection of algorithms that take radically different approaches to approximate similarity search. There are three primary classes and then hybrids of the three.

Tree-based ANNs

Tree-based ANNs partition the space hierarchically into regions. An example of a tree-based ANN are KD-trees. They're efficient for low-dimensional data, but performance degrades rapidly in high dimensions (colloquially known as the "curse of dimensionality").

Hashing-based ANNs

Hashing ANNs use hash functions or quantization to map similar vectors into the same bucket. Put crudely, it's fancy rounding. Hashing-based methods enable fast candidate lookup, but recall depends on the quality of hash / quantization scheme. It's well-suited for very high-dimensional data with acceptable approximation.

An example of a hashing-based ANN is product quantization.

Graph-based ANNs

Graph-based ANNs build a navigable graph of vectors where edges connect to nearby points. This graph is then traversed greedily, starting from an entry point, following edges towards the closer neighbors.

Graph-based ANNs have a strong balance for speed, accuracy, and scalability, but can be memory expensive at scale.

The most popular graph-based ANN is Hierarchal Navigable Small Worlds.

Hybrid or Composite Methods

Hybrid ANNs combine multiple approaches (e.g. HNSQ with quantization), aiming to optimize tradeoffs across recall, memory footprint, and query latency. An example of a hybrid ANN is scaNN or Vectroid's underlying mechanism.

How do you choose the right ANN for your use case?

To help you choose the right ANN for your use case, we'll describe a step-by-step methodology.

First, describe the data involves in your use case. Understand your dataset's size, including the number of vectors and dimensions. Understand what types of edits are necessary, forecasting data changes, types of edits (e.g. inserts, updates, or deletes), as some ANNs handle different edit types more efficiently than others. Finally, estimate a future scale of data.

Second, define your use case requirements and constraints. Understand your performance requirements, such as response accuracy, latency, and throughput. Understand your budget for storage and compute. Lastly, examine your existing infrastructure and your bandwidth to build atop of it or independently from it for siloed solutions.

Then, evaluate each of the ANN classes that might satisfy your requirements. Examine what meets your performance requirements and resource constraints. Additionally, make sure deploying that ANN is a good match for your stack and your team. If not, research if there is a managed solution that would be a better fit with the same performance benefits.

Evaluating the suitability of HNSWs

To illustrate what this process looks like in practice, lets consider the suitability of the widely used HNSW algorithm in the context of a hypothetical use case.

How does HNSW work?

Let's first understand how an HNSW works.

HNSW builds a multi-layer navigable graph of all vectors. Each node represents a vector with edges linking it to a subset of its nearest neighboring vectors. Edges are chosen based on proximity heuristics (for example, select M nearest neighbors, prune to maintain graph navigability).

When a new vector gets added, it is assigned a random maximum layer (using geometric distribution), then inserted into that top layer and every layer below it, refining neighbor edges at every step. Thus the top layers of the graph are sparser with longer connections between nodes, while lower layers are denser containing more nodes with shorter connections.

At query time, starting from an entry point at the top layer, a greedy search is performed to find the node that is most similar to the query vector. Once found, the algorithm descends to the layer below and repeats the process, using the previous most similar node as the new starting point. The process is repeated until reaching the bottom layer where a local best-first search (priority queue) is performed to identify the top K nearest neighbors.

How do you tune HNSW?

There are certain configurations that can impact how HNSW's perform over various data sets. These include:

M, denoting the number of neighbors per node, where higher means better recall, but larger index size

efConstruction, denoting how many candidates are considered when building the graph, where higher implies better graph quality but slower built time

efSearch, denoting how many candidates are explored during queries, where higher equals better recall, but slower queries

These tuning parameters also underscore why HNSWs are popular. They combine efficiency of long-range jumps with accuracy of local refinement. They also yield strong recall–latency tradeoffs at scale with sub-linear query complexity (roughly logarithmic scaling with dataset size). However, their memory usage can be significant due to storing graph edges and vectors.

The Example HNSW Use Case

Imagine that you are leading a team of 4 plucky engineers. Your team has been tasked with building a new semantic search feature for a product catalog with tens of millions of items, each represented by a moderately high dimensional vector (~700d). The catalog sees frequent inserts (new products) and occasional deletes. The inventory partnerships team hopes that the catalog will eventually grow to over 100M items, but it will likely take a few years to reach this scale. To ensure a smooth and interactive UX, you're aiming for ultra fast query speeds (latency < 50ms p99) and high recall (≥ 95%). The finance team has allocated a moderate cloud budget of $2K/month with room to scale.

For this scenario, let's apply the decision framework. First, describing the data, we know we have tens of millions of moderately high dimension data. We'll likely see new data, but will likely not reach 100M products in the next few years. However, inserts still matter more than updates and deletes. From the get-go, we can rule out a brute-force KNN—too many dimensions and vectors!

Accordingly, we must balance strong accuracy and fast latency for interactive UX. As a result, let's consider an HNSW. It fits many traits:

It's well-suited to handle tens of millions of entries

It supports inserts efficiently

It scales well with additional data

It's widely supported in production-ready libraries

Accordingly, the HNSW is the right choice for its strong recall, low latency, dynamic inserts, and mature ecosystems. The only problem is we'll have to be cognizant of memory use.

A Closing Thought

In the realm of vector similarity search, the choice between exact K-Nearest Neighbors (KNN) and Approximate Nearest Neighbors (ANN) methods involves some key tradeoffs. While KNN guarantees perfect accuracy, it often fails to scale efficiently with large datasets. ANN methods like HNSWs introduce controlled approximation that enables practical performance at scale, striking a balance between accuracy and computational efficiency that makes them viable for real-world applications.

The vector search landscape offers no universal solution, as the optimal algorithm depends on your specific dataset characteristics, workflows, growth trajectory, and performance/cost constraints. As illustrated in our HNSW example, each approach comes with distinct advantages and limitations—HNSW delivers excellent recall-latency balance but requires consideration of memory costs and has limited delete support.

By employing a good decision framework on your requirements, you can confidently navigate these tradeoffs to select the most appropriate solution for your needs.

Written by Vectroid Team

Engineering Team

Published 2025-09-24•9 min read