✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

How to Do RAG with Mastra and ZeroEntropy

Aug 17, 2025

This tutorial shows you how to build a high-quality RAG system using Mastra's framework with ZeroEntropy's specialized reranker. You'll learn how to combine fast vector search with accurate reranking to deliver relevant results without the cost and latency penalties of LLM-based reranking.

Why This Stack?

Mastra provides a complete RAG framework with vector store abstractions, metadata filtering, and flexible retrieval patterns. ZeroEntropy offers purpose-built reranking models that outperform LLM-based approaches while being 10x cheaper and faster.

This combination gives you:

Sub-second retrieval even with large document collections
Better relevance than basic vector similarity
Production-ready cost structure ($2-5 per 1k queries vs $20-100 for LLM reranking)
Clean abstractions that let you swap vector stores without rewriting code

Prerequisites

npm install @mastra/core @mastra/rag @mastra/pg @ai-sdk/openai ai

npm install @mastra/core @mastra/rag @mastra/pg @ai-sdk/openai ai

npm install @mastra/core @mastra/rag @mastra/pg @ai-sdk/openai ai

You'll need:

A PostgreSQL database with pgvector extension
OpenAI API key for embeddings
ZeroEntropy API access (sign up at dashboard.zeroentropy.dev)

Environment setup:

POSTGRES_CONNECTION_STRING=postgresql://user:pass@localhost:5432/rag_db
OPENAI_API_KEY=sk-...
ZEROENTROPY_API_KEY=ze-...  # Sign up at zeroentropy.ai

POSTGRES_CONNECTION_STRING=postgresql://user:pass@localhost:5432/rag_db
OPENAI_API_KEY=sk-...
ZEROENTROPY_API_KEY=ze-...  # Sign up at zeroentropy.ai

POSTGRES_CONNECTION_STRING=postgresql://user:pass@localhost:5432/rag_db
OPENAI_API_KEY=sk-...
ZEROENTROPY_API_KEY=ze-...  # Sign up at zeroentropy.ai

Step 1: Create the ZeroEntropy Reranker

First, implement the RelevanceScoreProvider interface for ZeroEntropy:

import type { RelevanceScoreProvider } from '@mastra/core/relevance';
import ZeroEntropy from 'zeroentropy';
export class ZeroEntropyRelevanceScorer implements RelevanceScoreProvider {
  private client: ZeroEntropy;
  private model: string;
  constructor(model?: string, apiKey?: string) {
    this.client = new ZeroEntropy({
      apiKey: apiKey || process.env.ZEROENTROPY_API_KEY || '',
    });
    this.model = model || 'zerank-1';
  }
  async getRelevanceScore(query: string, text: string): Promise<number> {
    const response = await this.client.models.rerank({
      query,
      documents: [text],
      model: this.model,
      top_n: 1,
    });
    
    return response.results[0]?.relevance_score ?? 0;
  }
}

import type { RelevanceScoreProvider } from '@mastra/core/relevance';
import ZeroEntropy from 'zeroentropy';
export class ZeroEntropyRelevanceScorer implements RelevanceScoreProvider {
  private client: ZeroEntropy;
  private model: string;
  constructor(model?: string, apiKey?: string) {
    this.client = new ZeroEntropy({
      apiKey: apiKey || process.env.ZEROENTROPY_API_KEY || '',
    });
    this.model = model || 'zerank-1';
  }
  async getRelevanceScore(query: string, text: string): Promise<number> {
    const response = await this.client.models.rerank({
      query,
      documents: [text],
      model: this.model,
      top_n: 1,
    });
    
    return response.results[0]?.relevance_score ?? 0;
  }
}

import type { RelevanceScoreProvider } from '@mastra/core/relevance';
import ZeroEntropy from 'zeroentropy';
export class ZeroEntropyRelevanceScorer implements RelevanceScoreProvider {
  private client: ZeroEntropy;
  private model: string;
  constructor(model?: string, apiKey?: string) {
    this.client = new ZeroEntropy({
      apiKey: apiKey || process.env.ZEROENTROPY_API_KEY || '',
    });
    this.model = model || 'zerank-1';
  }
  async getRelevanceScore(query: string, text: string): Promise<number> {
    const response = await this.client.models.rerank({
      query,
      documents: [text],
      model: this.model,
      top_n: 1,
    });
    
    return response.results[0]?.relevance_score ?? 0;
  }
}

How this works:

The getRelevanceScore method is called for each query-document pair during reranking
It sends the pair to ZeroEntropy's API using the zerank-1 model
Returns a relevance score (0-1 scale) indicating how well the text answers the query
Falls back to 0 if no score is returned

Step 2: Complete Working Example

Here's the full implementation replacing GPT-4o-mini with ZeroEntropy:

import { openai } from '@ai-sdk/openai';
import { Mastra } from '@mastra/core/mastra';
import { Agent } from '@mastra/core/agent';
import { PgVector } from '@mastra/pg';
import { MDocument, createVectorQueryTool } from '@mastra/rag';
import { embedMany } from 'ai';
import { ZeroEntropyRelevanceScorer } from './zeroentropy-scorer';

// Create vector query tool with ZeroEntropy reranker
const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'embeddings',
  model: openai.embedding('text-embedding-3-small'),
  reranker: {
    provider: new ZeroEntropyRelevanceScorer('zerank-1'),
  },
});

// Create RAG agent
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  instructions: `You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.
    Important: When asked to answer a question, please base your answer only on the context provided in the tool. 
    If the context doesn't contain enough information to fully answer the question, please state that explicitly.`,
  model: openai('gpt-4o-mini'),
  tools: {
    vectorQueryTool,
  },
});

// Initialize Mastra with PgVector
const pgVector = new PgVector({ 
  connectionString: process.env.POSTGRES_CONNECTION_STRING! 
});

export const mastra = new Mastra({
  agents: { ragAgent },
  vectors: { pgVector },
});

// Prepare document with mixed content
const doc1 = MDocument.fromText(`
market data shows price resistance levels.
technical charts display moving averages.
support levels guide trading decisions.
breakout patterns signal entry points.
price action determines trade timing.
baseball cards show gradual value increase.
rookie cards command premium prices.
card condition affects resale value.
authentication prevents fake trading.
grading services verify card quality.
volume analysis confirms price trends.
sports cards track seasonal demand.
chart patterns predict movements.
mint condition doubles card worth.
resistance breaks trigger orders.
rare cards appreciate yearly.
`);

// Chunk the document
const chunks = await doc1.chunk({
  strategy: 'recursive',
  maxSize: 150,
  overlap: 20,
  separator: '\n',
});

// Create embeddings
const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map(chunk => chunk.text),
});

// Initialize vector store
const vectorStore = mastra.getVector('pgVector');

await vectorStore.createIndex({
  indexName: 'embeddings',
  dimension: 1536,
});

// Store embeddings with metadata
await vectorStore.upsert({
  indexName: 'embeddings',
  vectors: embeddings,
  metadata: chunks?.map((chunk: any) => ({ text: chunk.text })),
});

// Get the agent
const agent = mastra.getAgent('ragAgent');

// Query 1: Technical trading
const queryOne = 'explain technical trading analysis';
const answerOne = await agent.generate(queryOne);
console.log('\nQuery:', queryOne);
console.log('Response:', answerOne.text);

// Query 2: Sports cards
const queryTwo = 'explain trading card valuation';
const answerTwo = await agent.generate(queryTwo);
console.log('\nQuery:', queryTwo);
console.log('Response:', answerTwo.text);

// Query 3: Market resistance
const queryThree = 'how do you analyze market resistance';
const answerThree = await agent.generate(queryThree);
console.log('\nQuery:', queryThree);
console.log('Response:', answerThree.text);

import { openai } from '@ai-sdk/openai';
import { Mastra } from '@mastra/core/mastra';
import { Agent } from '@mastra/core/agent';
import { PgVector } from '@mastra/pg';
import { MDocument, createVectorQueryTool } from '@mastra/rag';
import { embedMany } from 'ai';
import { ZeroEntropyRelevanceScorer } from './zeroentropy-scorer';

// Create vector query tool with ZeroEntropy reranker
const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'embeddings',
  model: openai.embedding('text-embedding-3-small'),
  reranker: {
    provider: new ZeroEntropyRelevanceScorer('zerank-1'),
  },
});

// Create RAG agent
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  instructions: `You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.
    Important: When asked to answer a question, please base your answer only on the context provided in the tool. 
    If the context doesn't contain enough information to fully answer the question, please state that explicitly.`,
  model: openai('gpt-4o-mini'),
  tools: {
    vectorQueryTool,
  },
});

// Initialize Mastra with PgVector
const pgVector = new PgVector({ 
  connectionString: process.env.POSTGRES_CONNECTION_STRING! 
});

export const mastra = new Mastra({
  agents: { ragAgent },
  vectors: { pgVector },
});

// Prepare document with mixed content
const doc1 = MDocument.fromText(`
market data shows price resistance levels.
technical charts display moving averages.
support levels guide trading decisions.
breakout patterns signal entry points.
price action determines trade timing.
baseball cards show gradual value increase.
rookie cards command premium prices.
card condition affects resale value.
authentication prevents fake trading.
grading services verify card quality.
volume analysis confirms price trends.
sports cards track seasonal demand.
chart patterns predict movements.
mint condition doubles card worth.
resistance breaks trigger orders.
rare cards appreciate yearly.
`);

// Chunk the document
const chunks = await doc1.chunk({
  strategy: 'recursive',
  maxSize: 150,
  overlap: 20,
  separator: '\n',
});

// Create embeddings
const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map(chunk => chunk.text),
});

// Initialize vector store
const vectorStore = mastra.getVector('pgVector');

await vectorStore.createIndex({
  indexName: 'embeddings',
  dimension: 1536,
});

// Store embeddings with metadata
await vectorStore.upsert({
  indexName: 'embeddings',
  vectors: embeddings,
  metadata: chunks?.map((chunk: any) => ({ text: chunk.text })),
});

// Get the agent
const agent = mastra.getAgent('ragAgent');

// Query 1: Technical trading
const queryOne = 'explain technical trading analysis';
const answerOne = await agent.generate(queryOne);
console.log('\nQuery:', queryOne);
console.log('Response:', answerOne.text);

// Query 2: Sports cards
const queryTwo = 'explain trading card valuation';
const answerTwo = await agent.generate(queryTwo);
console.log('\nQuery:', queryTwo);
console.log('Response:', answerTwo.text);

// Query 3: Market resistance
const queryThree = 'how do you analyze market resistance';
const answerThree = await agent.generate(queryThree);
console.log('\nQuery:', queryThree);
console.log('Response:', answerThree.text);

import { openai } from '@ai-sdk/openai';
import { Mastra } from '@mastra/core/mastra';
import { Agent } from '@mastra/core/agent';
import { PgVector } from '@mastra/pg';
import { MDocument, createVectorQueryTool } from '@mastra/rag';
import { embedMany } from 'ai';
import { ZeroEntropyRelevanceScorer } from './zeroentropy-scorer';

// Create vector query tool with ZeroEntropy reranker
const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'embeddings',
  model: openai.embedding('text-embedding-3-small'),
  reranker: {
    provider: new ZeroEntropyRelevanceScorer('zerank-1'),
  },
});

// Create RAG agent
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  instructions: `You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.
    Important: When asked to answer a question, please base your answer only on the context provided in the tool. 
    If the context doesn't contain enough information to fully answer the question, please state that explicitly.`,
  model: openai('gpt-4o-mini'),
  tools: {
    vectorQueryTool,
  },
});

// Initialize Mastra with PgVector
const pgVector = new PgVector({ 
  connectionString: process.env.POSTGRES_CONNECTION_STRING! 
});

export const mastra = new Mastra({
  agents: { ragAgent },
  vectors: { pgVector },
});

// Prepare document with mixed content
const doc1 = MDocument.fromText(`
market data shows price resistance levels.
technical charts display moving averages.
support levels guide trading decisions.
breakout patterns signal entry points.
price action determines trade timing.
baseball cards show gradual value increase.
rookie cards command premium prices.
card condition affects resale value.
authentication prevents fake trading.
grading services verify card quality.
volume analysis confirms price trends.
sports cards track seasonal demand.
chart patterns predict movements.
mint condition doubles card worth.
resistance breaks trigger orders.
rare cards appreciate yearly.
`);

// Chunk the document
const chunks = await doc1.chunk({
  strategy: 'recursive',
  maxSize: 150,
  overlap: 20,
  separator: '\n',
});

// Create embeddings
const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map(chunk => chunk.text),
});

// Initialize vector store
const vectorStore = mastra.getVector('pgVector');

await vectorStore.createIndex({
  indexName: 'embeddings',
  dimension: 1536,
});

// Store embeddings with metadata
await vectorStore.upsert({
  indexName: 'embeddings',
  vectors: embeddings,
  metadata: chunks?.map((chunk: any) => ({ text: chunk.text })),
});

// Get the agent
const agent = mastra.getAgent('ragAgent');

// Query 1: Technical trading
const queryOne = 'explain technical trading analysis';
const answerOne = await agent.generate(queryOne);
console.log('\nQuery:', queryOne);
console.log('Response:', answerOne.text);

// Query 2: Sports cards
const queryTwo = 'explain trading card valuation';
const answerTwo = await agent.generate(queryTwo);
console.log('\nQuery:', queryTwo);
console.log('Response:', answerTwo.text);

// Query 3: Market resistance
const queryThree = 'how do you analyze market resistance';
const answerThree = await agent.generate(queryThree);
console.log('\nQuery:', queryThree);
console.log('Response:', answerThree.text);

Key Takeaways

ZeroEntropy replaces LLM rerankers with one configuration change: Just swap model: openai('gpt-4o-mini') for provider: new ZeroEntropyRelevanceScorer('zerank-1')
The scorer implementation is simple: Implement RelevanceScoreProvider interface with a single method that calls ZeroEntropy's API
Always include metadata.text: The reranker needs the original text content to score relevance
Monitor and tune: Track scores and latency to validate that reranking improves your results
10x better economics: ZeroEntropy gives you faster, cheaper, and more accurate reranking than LLM-based approaches

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

How to Do RAG with Mastra and ZeroEntropy

SHARE

Why This Stack?

Prerequisites

Step 1: Create the ZeroEntropy Reranker

How this works:

Step 2: Complete Working Example

Key Takeaways

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking