Octopus
RAG Pipeline Overview
RAG (Retrieval-Augmented Generation) is the mechanism by which Octopus agents answer factual questions from indexed documents. A document is broken into chunks, embedded into vectors, stored in a vector database, and recalled at query time by semantic similarity. This guide covers the complete pipeline from ingestion to context injection.
The Two Phases of RAG
Indexing Phase (Offline)
Documents are ingested, chunked, embedded, and stored in the vector database. This phase runs when documents are uploaded or updated — not on every user turn.
Stages: Ingest → Chunk → Embed → Store
Retrieval Phase (Per Turn)
The user's message is embedded, the vector database is searched for similar chunks, and the top results are injected into working memory before the LLM call.
Stages: Embed Query → Search → (Rerank) → Inject
End-to-End Pipeline
1
Document Upload
Admin uploads PDF, DOCX, TXT, or Markdown via the knowledge-app UI or the
POST /api/octopus/knowledge/{agentId}/documents endpoint.
2
Text Extraction
IDocumentIngester extracts raw text from the uploaded file. Supports PDF (PdfPig), DOCX (Open XML SDK), TXT, and Markdown.
3
Chunking
Text is split into overlapping chunks by
IChunker. Default: 512-token chunks with 64-token overlap. Semantic and sentence-aware strategies also available.
4
Embedding
Each chunk is embedded via
IEmbeddingProvider (OpenAI, Azure OpenAI, or local ONNX model) into a float vector.
5
Vector Store Write
Chunk text + embedding + metadata written to the agent's vector collection (Qdrant or PGVector).
6
Query Embedding (per turn)
User's current message embedded by the same model used at indexing time.
7
Vector Search + Optional Rerank
Top-K chunks returned by cosine similarity. Optional: hybrid search (BM25 + vector) and reranking by cross-encoder model.
8
Context Injection
Retrieved chunks formatted as [Retrieved Knowledge] block and injected into working memory before the LLM call.
Supported Document Formats
| Format | Extractor | Notes |
|---|---|---|
| PdfPig | Text extraction; no OCR for scanned images | |
| DOCX | Open XML SDK | Paragraphs, tables, headers |
| TXT | Built-in | Raw UTF-8 text |
| Markdown (.md) | Built-in | Stripped to plain text |
| HTML | HtmlAgilityPack | Body text extracted; scripts/styles stripped |
Key Configuration Settings
// RAG-related agent memory config
{
"semanticEnabled": true,
"semanticTopK": 5, // Chunks retrieved per turn
"semanticMinScore": 0.75, // Minimum cosine similarity threshold
"hybridSearchEnabled": false, // BM25 + vector fusion
"rerankerEnabled": false // Cross-encoder reranking
}
// Chunking config (in appsettings.json)
{
"Octopus": {
"Chunking": {
"Strategy": "FixedSize", // FixedSize | Sentence | Semantic
"ChunkSize": 512, // tokens
"Overlap": 64 // tokens
}
}
}