Database Maintenance — Octopus Database

Memory Retention Job

The MemoryRetentionJob is a background service that runs nightly (configurable) and performs three tasks:

Session Timeout

Sets any Active conversations that have had no new episodes in the last SessionInactivityTimeoutMinutes (default: 60) to TimedOut status.

Episode Archival

Moves episodes older than the configured EpisodeRetentionDays (default: 90) from Octopus_Episodes to Octopus_Episodes_Archive. The archive table has the same schema and is queryable — it just is not part of the active working set.

Stale Procedure Cleanup

Optionally deletes or archives procedures that have never been activated and are older than ProcedureRetentionDays (default: 180). Enabled by setting PurgeUnusedProcedures = true.

Configuration

// appsettings.json — Octopus memory retention settings
{
  "Octopus": {
    "MemoryRetention": {
      "EpisodeRetentionDays": 90,
      "ProcedureRetentionDays": 180,
      "SessionInactivityTimeoutMinutes": 60,
      "PurgeUnusedProcedures": false,
      "ArchiveEnabled": true,
      "JobScheduleCron": "0 2 * * *"   // 2:00 AM UTC daily
    }
  }
}

Index Maintenance

The Octopus_Episodes table receives the most writes and is the most critical to keep defragmented. The clustered index is on EpisodeId (NEWSEQUENTIALID), which minimizes page splits. However, the non-clustered indexes (especially IX_Episodes_TenantId_SessionId_Timestamp) need periodic rebuilding.

Recommended Index Maintenance Script

-- Run weekly during off-peak hours
-- Rebuild indexes with > 30% fragmentation
DECLARE @TableName NVARCHAR(200) = 'Octopus_Episodes';
DECLARE @IndexName NVARCHAR(200);
DECLARE @Fragmentation FLOAT;

DECLARE index_cursor CURSOR FOR
SELECT i.name, s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(@TableName), NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.avg_fragmentation_in_percent > 10;

OPEN index_cursor;
FETCH NEXT FROM index_cursor INTO @IndexName, @Fragmentation;

WHILE @@FETCH_STATUS = 0
BEGIN
    IF @Fragmentation > 30
        EXEC('ALTER INDEX [' + @IndexName + '] ON [' + @TableName + '] REBUILD WITH (ONLINE = ON)');
    ELSE
        EXEC('ALTER INDEX [' + @IndexName + '] ON [' + @TableName + '] REORGANIZE');
    FETCH NEXT FROM index_cursor INTO @IndexName, @Fragmentation;
END

CLOSE index_cursor;
DEALLOCATE index_cursor;

JSON Column Storage and Compression

The Messages and ToolCalls columns use GZip compression via an EF Core value converter. This reduces storage for typical conversational content by 60-75%. A few important points for DBAs:

The compressed binary cannot be read with OPENJSON() directly — always use the application layer to read episode content.
You can check uncompressed vs. compressed sizes using application-side diagnostics or by sampling rows and decompressing in a test harness.
If you need to inspect episode content in SSMS for debugging, use the Octopus Admin API's GET /api/octopus/episodes/{episodeId} endpoint — it returns decompressed JSON.

Archive Table Strategy

The archive table Octopus_Episodes_Archive mirrors the schema of Octopus_Episodes exactly but is optimized for cold storage:

No non-clustered indexes by default — use covering indexes only if archive reporting is required
Columnstore index is recommended for reporting workloads on the archive table
Archive data can be offloaded to Azure Blob Storage or S3 using SQL Server's Data Tiering feature for long-term retention

-- Example: Add columnstore index to archive for analytical queries
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Episodes_Archive
ON Octopus_Episodes_Archive
(TenantId, SessionId, Role, Timestamp, TokenCount);
-- Excludes varbinary columns (Messages, ToolCalls) — not supported in columnstore

Backup Strategy

Backup Type	Frequency	Retention	Notes
Full Backup	Weekly (Sunday 01:00 UTC)	4 weeks	Covers all tables including archive
Differential Backup	Daily (01:00 UTC, except Sunday)	7 days	Incremental from last full
Transaction Log Backup	Every 15 minutes	48 hours	Enables point-in-time recovery

Vector Store Backup Is Separate

The SQL database backup does NOT include the vector store data (embeddings in Qdrant or PGVector). The vector store must be backed up independently. If the vector store and SQL database backups are not synchronized, restoring them to mismatched points-in-time will result in orphaned EmbeddingRef values in SQL with no corresponding vector store entry, breaking semantic recall. Coordinate your backup windows.

Monitoring Queries

-- Check table sizes (run periodically to track growth)
SELECT
    t.NAME AS TableName,
    p.rows AS RowCounts,
    CAST(ROUND((SUM(a.total_pages) * 8) / 1024.0, 2) AS DECIMAL(10,2)) AS TotalSpaceMB,
    CAST(ROUND((SUM(a.used_pages) * 8) / 1024.0, 2) AS DECIMAL(10,2)) AS UsedSpaceMB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.NAME LIKE 'Octopus_%'
  AND t.is_ms_shipped = 0
  AND i.OBJECT_ID > 255
  AND i.index_id <= 1
GROUP BY t.NAME, p.Rows
ORDER BY TotalSpaceMB DESC;

← Querying the Database