zembed-1 vs voyage-4

Mar 5, 2026

SHARE

Voyage-4 vs zembed-1

👉 TL;DR:

  • zembed-1 pareto outperforms voyage-4 on almost all verticals in accuracy

  • zembed-1 is more robust than voyage-4 to unclean prod-like data, showing smaller degradation from clean documents

  • At 2 QPS, zembed-1 offers a P50 latency of 280 ms. Voyage offers 2500ms.

  • zembed-1 costs $0.05/million tokens (currently reduced to $0.025 until June 1st!), voyage-4 offers at $0.06/million tokens.

SOTA Accuracy: A Tale of Two Models

On our evaluations we found that of the various flagship models, two consistently outperformed the rest: zembed-1 and voyage-4, exceeding other models such as cohere embed v4 or openai v3 large by 5-7%+ Recall@100 across all verticals.

So if the bottleneck on your agent or application is accuracy, the choice is essentially between these two options. So how do they compare in a head-to-head matchup?


Across 22 evaluation datasets, zembed-1 outperforms voyage-4 at Recall@100 on 20 of them.

Latency

Our figures here, as in our launch blog post, were calculated using our open source benchmarking platform. Same code testing each endpoint, no inherent advantage to any company.

Note: The above are for public api endpoints. Self-hosting solutions like our AWS Marketplace offerings, or on-prem solutions (we open sourced the weights for zembed-1!) can be further optimized for enterprise clients.

Robustness to Messy (AKA Prod-Like) Data:

Production Data is rarely clean — chunking can fragment context across snippets, imperfect data cleaning (especially of web data) can add unrelated information, and synthetic transformations can misrepresent source text. Thus, a simple question for users concerned about how clean public benchmark results might translate to actual production performance might be as follows:

What happens to NDCG@10 performance when we prepend unrelated information to each query, or to each document?

For our testing purposes, we prepended the below string to every query in the corpus and re-ran evals:

PREPEND_SENTENCE = "Irrelevant statement: Water bottle flipping contest. \n\n Query: \n\n"

Now, given the geometry of embedding models, we would naturally expect some degradation, but a strong desideratum would be to minimize it for the purpose of robustness to messy, prod data. zembed-1 indeed exhibits this degradation, going from 0.738 to 0.729 (Δ-0.009) NDCG@10 on the popular public dataset CureV1. Voyage-4 goes from 0.687 to 0.633 (Δ-0.054).


Voyage-4 thus exhibits a degradation five times larger than zembed-1.


Direct Pairwise Comparison on Query, Doc1, Doc2 Triplets:

But what about on specific pairs of (query, document) scores that zembed-1 and voyage-4 disagree on? When the two frontier models disagree, who is generally correct?

On an agglomeration of 14 public datasets (notably fiqa, bioasq, cosqa), we took the top 100 documents by cosine similarity to all queries for both voyage-4 and zembed-1 on every query. We then identified pairwise ranking inversions — pairs of documents (d₁, d₂) where the two models disagree on which of the two is more relevant to a query q, with a rank gap of at least 30 positions. We then sent that (q,d1,d2) to gemini-3-flash, gpt-5-nano, and grok-4-fast-non-reasoning, asking which of the two documents was more relevant to the query.


A 15-20% differential in favor of zembed-1 is consistently observed.



When we filter for the cases which will most affect production performance (those where either zembed or voyage determined one of the documents to be in the top 10) the gap becomes even larger, at 27%-33%.


Multilingual Performance on Public Evals:

Zembed's accuracy advantage isn't limited to English or general benchmarks. Here's how our model compares across verticals and languages against the full competitive field.

zembed-1 is multilingual from the ground up. Over half of the training data used to create zembed-1 was in languages other than English. With our focus on well-calibrated cross-lingual query-document pairs, you get exactly the same Elo-trained relevance judgement whether the query is in English, Japanese, Arabic, or any other major language.

For full evaluation results across all datasets and configurations, see the detailed spreadsheet here.

What Embedding Model is Best for Me?

Choosing an embedding model has traditionally been a trade-off between accuracy, expected latency, and cost. zembed-1 pareto-dominates every other frontier embedding model on accuracy, and strongly over-performs voyage-4-nano across verticals. No other provider offers latency as consistently low (Cohere P50 is 2.3x, OpenAI is 3.5x, Voyage is 2.3x + failed requests). And at $0.05 per million tokens (Cohere 24x, OpenAI 2.6x, Voyage 1.2x), SOTA search has never been more affordable.

But what if your use case is even more cost-sensitive, and you're willing to sacrifice a few NDCG points for massive storage cost reduction? With zembed-1's quantization offerings, you can control that trade-off directly and exactly. Quantize your embedding to binary, for example, and get a massive -94% storage cost reduction while still enjoying 89% performance of zembed-1 (as accurate as openai v3 large).

Get started

zembed-1 is available today through multiple deployment options:

ZeroEntropy API: fully managed, lowest-friction path to production

HuggingFace: open weights, run it on your own infrastructure

AWS Marketplace: deploy within your existing AWS environment

from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.embed(
model="zembed-1",
input_type="query", # "query" or "document"
input="What is retrieval augmented generation?", # string or list[str]
dimensions=2560, # optional: must be one of [2560, 1280, 640, 320, 160, 80, 40]
encoding_format="float", # "float" or "base64"
latency="fast", # "fast" or "slow"; omit for auto
)
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.embed(
model="zembed-1",
input_type="query", # "query" or "document"
input="What is retrieval augmented generation?", # string or list[str]
dimensions=2560, # optional: must be one of [2560, 1280, 640, 320, 160, 80, 40]
encoding_format="float", # "float" or "base64"
latency="fast", # "fast" or "slow"; omit for auto
)
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.embed(
model="zembed-1",
input_type="query", # "query" or "document"
input="What is retrieval augmented generation?", # string or list[str]
dimensions=2560, # optional: must be one of [2560, 1280, 640, 320, 160, 80, 40]
encoding_format="float", # "float" or "base64"
latency="fast", # "fast" or "slow"; omit for auto
)

Documentation: docs.zeroentropy.dev

HuggingFace: huggingface.co/zeroentropy

Get in touch: Discord community or contact@zeroentropy.dev

Talk to us if you need a custom deployment, volume pricing, or want to see how zembed-1 + zerank-2 performs on your data.

Get started with

ZeroEntropy Animation Gif

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

Abstract image of a dark background with blurry teal, blue, and pink gradients.