Back

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Oct 14, 2025 ·

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

TL;DR

Vera Health set a new record on the US Medical Licensing Exam (USMLE) with 97.5% overall accuracy, surpassing OpenAI, Google, and Anthropic models. Behind this performance is ZeroEntropy’s Retrieval API, powering real-time search and retrieval across more than 60 million peer-reviewed papers and clinical guidelines.

Overview

Vera Health, an AI platform for medical professionals, set a new record on the US Medical Licensing Exam (USMLE) with 97.5% overall accuracy, surpassing OpenAI, Google, and Anthropic models. It also led benchmarks such as NEJM Q&A (84.9%) and MedXpertQA (62.2%), becoming the top-performing medical reasoning system in the world.

Behind this performance is ZeroEntropy’s Retrieval API, powering Vera’s real-time search and retrieval across more than 60 million peer-reviewed papers and clinical guidelines.

Challenge

Medical reasoning requires precision retrieval, not just memorization.

Traditional LLMs fail when:

Answers depend on the latest research or treatment guidelines.
Queries are nuanced, context-dependent, or multi-step (e.g., “Compare treatment options for stage 2 hypertension in smokers with COPD”).
RAG pipelines return hundreds of irrelevant PubMed abstracts that overwhelm the model.

Vera’s goal was to make AI clinically useful, providing doctors with answers they can trust, based on the most current and authoritative sources.

Solution: Agentic Search Powered by ZeroEntropy

Vera integrates ZeroEntropy’s full-stack retrieval API: from hybrid search to reranking.

Dynamic multi-hop retrieval

Vera’s agent decomposes a clinician’s question into sub-queries (e.g., drug interactions, contraindications, dosing) and calls ZeroEntropy multiple times to gather evidence from PubMed and clinical guideline repositories.

Precision reranking

ZeroEntropy’s reranker filters and orders results for clinical relevance, ensuring that only the top evidence-based snippets reach the LLM.

Latency-optimized search

Designed for real-time use at the point of care — physicians receive reliable answers in seconds.

Results

Benchmark	Previous SOTA	Vera (w/ ZeroEntropy)	Improvement
USMLE (Steps 1–3)	Pathway (94%)	97.5%	+3.5 pts
NEJM Q&A	Claude 3 Sonnet (71%)	84.9%	+13.9 pts
MedXpertQA	GPT-4o (37.3%)	62.2%	+25 pts

Beyond raw accuracy, Vera reports:

2× faster inference latency for retrieval-augmented queries.
>90% reduction in irrelevant citations returned per query.
Stable clinical alignment with 2024–2025 treatment guidelines, verified by internal physician benchmarks.

Vera Sets New Record for AI on the US Medical Exam - Source: Vera Health

Why ZeroEntropy

Impact

Vera is now used by clinicians at institutions such as Mayo Clinic, Penn, and Yale, providing real-time, evidence-based recommendations.

By combining ZeroEntropy’s retrieval intelligence with Vera’s medical reasoning models and AI Agents, doctors can:

Instantly verify treatment guidelines and contraindications.
Generate differentials and dosing calculations backed by the latest literature.
Improve patient outcomes through faster, safer decision-making.

About Vera Health

Vera Health builds the world’s most accurate AI medical assistant, trained on live scientific evidence and built with practicing clinicians. Backed by Y Combinator and Gradient Ventures (Google AI Fund).

About ZeroEntropy

ZeroEntropy develops state-of-the-art retrieval models for AI applications. Its search, reranking, and multi-query orchestration APIs enable the most advanced context-aware AI systems, from legal research to clinical reasoning.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

Apr 15, 2026

Zemail: Semantic Gmail Search on Claude Code & Cowork

Zemail is a free Claude Code/Cowork plugin that builds a local semantic index of your Gmail inbox. Keyword search can't find the email you're thinking of. A reranker can.

Apr 02, 2026

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

How to use zerank-2's calibrated relevance scores as a binary classifier for context compression, document routing, and multi-label classification — at 50-100x less cost than LLM classification.

Mar 02, 2026

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

A deep dive into how embedding models encode meaning, why famous training examples create the illusion of capability, and what consistent behavior across 10k+ nouns tells us about genuine understanding.

The best AI teams retrieve with ZeroEntropy

Book Demo View docs