RAG Pipeline Overview — RAG

The Two Phases of RAG

Indexing Phase (Offline)

Documents are ingested, chunked, embedded, and stored in the vector database. This phase runs when documents are uploaded or updated — not on every user turn.

Stages: Ingest → Chunk → Embed → Store

Retrieval Phase (Per Turn)

The user's message is embedded, the vector database is searched for similar chunks, and the top results are injected into working memory before the LLM call.

Stages: Embed Query → Search → (Rerank) → Inject

End-to-End Pipeline

Document Upload Admin uploads PDF, DOCX, TXT, or Markdown via the knowledge-app UI or the POST /api/octopus/knowledge/{agentId}/documents endpoint.

Text Extraction IDocumentIngester extracts raw text from the uploaded file. Supports PDF (PdfPig), DOCX (Open XML SDK), TXT, and Markdown.

Chunking Text is split into overlapping chunks by IChunker. Default: 512-token chunks with 64-token overlap. Semantic and sentence-aware strategies also available.

Embedding Each chunk is embedded via IEmbeddingProvider (OpenAI, Azure OpenAI, or local ONNX model) into a float vector.

Vector Store Write Chunk text + embedding + metadata written to the agent's vector collection (Qdrant or PGVector).

Query Embedding (per turn) User's current message embedded by the same model used at indexing time.

Vector Search + Optional Rerank Top-K chunks returned by cosine similarity. Optional: hybrid search (BM25 + vector) and reranking by cross-encoder model.

Context Injection Retrieved chunks formatted as [Retrieved Knowledge] block and injected into working memory before the LLM call.

Supported Document Formats

Format	Extractor	Notes
PDF	PdfPig	Text extraction; no OCR for scanned images
DOCX	Open XML SDK	Paragraphs, tables, headers
TXT	Built-in	Raw UTF-8 text
Markdown (.md)	Built-in	Stripped to plain text
HTML	HtmlAgilityPack	Body text extracted; scripts/styles stripped

Key Configuration Settings

// RAG-related agent memory config
{
  "semanticEnabled":      true,
  "semanticTopK":         5,       // Chunks retrieved per turn
  "semanticMinScore":     0.75,    // Minimum cosine similarity threshold
  "hybridSearchEnabled":  false,   // BM25 + vector fusion
  "rerankerEnabled":      false    // Cross-encoder reranking
}

// Chunking config (in appsettings.json)
{
  "Octopus": {
    "Chunking": {
      "Strategy":   "FixedSize",  // FixedSize | Sentence | Semantic
      "ChunkSize":  512,          // tokens
      "Overlap":    64            // tokens
    }
  }
}

Next: Document Ingestion →