First-Pass Retrieval,
state-of-the-art recall
zembed-1 is the current #1 embedding model on graded retrieval benchmarks. 4B parameters, up to 2560 dimensions, multilingual, and instruction-aware. Because your embedding search's ceiling is your entire pipeline's ceiling.
Dense Recall is a ceiling
Embeddings are fast and cheap and how you find a hundred plausible candidates out of ten million in milliseconds. But Recall@100 is the silent ceiling on every RAG pipeline: everything downstream – LLM calls, rerankers, your agents – can only sort what the embedding surfaces, and most models simply leave relevant documents on the floor.
Across 28 datasets and 3 LLM judges. Ahead of voyage-4 (0.712) and harrier-27b (0.706).
Highest of any embedding model we test: +2.0 pts over voyage-4, +2.2 over harrier-27b.
P50 ~280 ms vs ~2500 ms at 2 QPS / 2560-dim / 512-token inputs. Fast mode goes lower.
“Better recall on our long-tail queries was the entire reason to switch. The reranker downstream got a better candidate set on every query, and our metrics moved.”
“Multilingual recall held up across our European and Asian markets where the previous embedding fell off a cliff.”
Trained on similarity, not boundaries.
Most embedding models train against binary relevant/not-relevant labels. zembed-1 is trained on continuous relevance scores derived from pairwise LLM preferences — the same signal behind zerank-2. This is why it does disproportionately well on graded evaluations where binary-trained competitors plateau.
Continuous relevance scores
Pairwise LLM preferences are converted into absolute ELO-style scores via Thurstone fitting — a graded signal, not a binary one.
Broad-domain training
Legal, medical, financial, code, multilingual, and technical corpora — chosen so the model generalizes to private enterprise data, not just the public benchmark.
Flexible dimensions
Output 1024 / 1536 / 2560-dim vectors. 1536 is the production sweet spot; 2560 for last-mile accuracy; 1024 for index-cost-sensitive consumer search.
Query/document asymmetry
An `input_type` parameter (`query` vs `document`) embeds the same string at slightly different points in space — reflecting the inherent asymmetry of search.
Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.
# Create an API Key at https://dashboard.zeroentropy.dev
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.rerank(
model="zerank-2",
query="What is Retrieval Augmented Generation?",
documents=[
"RAG combines retrieval with generation...",
],
)
for doc in response.results:
print(doc)Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.
From security to scale, ZeroEntropy is built for the demands of production ready AI

SOC2 Type II
Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant
BAA-ready infrastructure with encryption at rest and in transit for protected health data.

GDPR Compliant
Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant
Consumer data rights honored with full transparency on collection, use, and deletion.
