Database Maintenance
Operational guidance for keeping the Octopus database healthy — index maintenance, JSON column management, archival jobs, backup strategy, and monitoring.
Memory Retention Job
The MemoryRetentionJob is a background service that runs nightly (configurable) and performs three tasks:
Session Timeout
Sets any Active conversations that have had no new episodes in the last SessionInactivityTimeoutMinutes (default: 60) to TimedOut status.
Episode Archival
Moves episodes older than the configured EpisodeRetentionDays (default: 90) from Octopus_Episodes to Octopus_Episodes_Archive. The archive table has the same schema and is queryable — it just is not part of the active working set.
Stale Procedure Cleanup
Optionally deletes or archives procedures that have never been activated and are older than ProcedureRetentionDays (default: 180). Enabled by setting PurgeUnusedProcedures = true.
Configuration
// appsettings.json — Octopus memory retention settings
{
"Octopus": {
"MemoryRetention": {
"EpisodeRetentionDays": 90,
"ProcedureRetentionDays": 180,
"SessionInactivityTimeoutMinutes": 60,
"PurgeUnusedProcedures": false,
"ArchiveEnabled": true,
"JobScheduleCron": "0 2 * * *" // 2:00 AM UTC daily
}
}
}
Index Maintenance
The Octopus_Episodes table receives the most writes and is the most critical to keep defragmented. The clustered index is on EpisodeId (NEWSEQUENTIALID), which minimizes page splits. However, the non-clustered indexes (especially IX_Episodes_TenantId_SessionId_Timestamp) need periodic rebuilding.
Recommended Index Maintenance Script
-- Run weekly during off-peak hours
-- Rebuild indexes with > 30% fragmentation
DECLARE @TableName NVARCHAR(200) = 'Octopus_Episodes';
DECLARE @IndexName NVARCHAR(200);
DECLARE @Fragmentation FLOAT;
DECLARE index_cursor CURSOR FOR
SELECT i.name, s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(@TableName), NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.avg_fragmentation_in_percent > 10;
OPEN index_cursor;
FETCH NEXT FROM index_cursor INTO @IndexName, @Fragmentation;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @Fragmentation > 30
EXEC('ALTER INDEX [' + @IndexName + '] ON [' + @TableName + '] REBUILD WITH (ONLINE = ON)');
ELSE
EXEC('ALTER INDEX [' + @IndexName + '] ON [' + @TableName + '] REORGANIZE');
FETCH NEXT FROM index_cursor INTO @IndexName, @Fragmentation;
END
CLOSE index_cursor;
DEALLOCATE index_cursor;
JSON Column Storage and Compression
The Messages and ToolCalls columns use GZip compression via an EF Core value converter. This reduces storage for typical conversational content by 60-75%. A few important points for DBAs:
- The compressed binary cannot be read with
OPENJSON()directly — always use the application layer to read episode content. - You can check uncompressed vs. compressed sizes using application-side diagnostics or by sampling rows and decompressing in a test harness.
- If you need to inspect episode content in SSMS for debugging, use the Octopus Admin API's
GET /api/octopus/episodes/{episodeId}endpoint — it returns decompressed JSON.
Archive Table Strategy
The archive table Octopus_Episodes_Archive mirrors the schema of Octopus_Episodes exactly but is optimized for cold storage:
- No non-clustered indexes by default — use covering indexes only if archive reporting is required
- Columnstore index is recommended for reporting workloads on the archive table
- Archive data can be offloaded to Azure Blob Storage or S3 using SQL Server's Data Tiering feature for long-term retention
-- Example: Add columnstore index to archive for analytical queries
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Episodes_Archive
ON Octopus_Episodes_Archive
(TenantId, SessionId, Role, Timestamp, TokenCount);
-- Excludes varbinary columns (Messages, ToolCalls) — not supported in columnstore
Backup Strategy
| Backup Type | Frequency | Retention | Notes |
|---|---|---|---|
| Full Backup | Weekly (Sunday 01:00 UTC) | 4 weeks | Covers all tables including archive |
| Differential Backup | Daily (01:00 UTC, except Sunday) | 7 days | Incremental from last full |
| Transaction Log Backup | Every 15 minutes | 48 hours | Enables point-in-time recovery |
The SQL database backup does NOT include the vector store data (embeddings in Qdrant or PGVector). The vector store must be backed up independently. If the vector store and SQL database backups are not synchronized, restoring them to mismatched points-in-time will result in orphaned EmbeddingRef values in SQL with no corresponding vector store entry, breaking semantic recall. Coordinate your backup windows.
Monitoring Queries
-- Check table sizes (run periodically to track growth)
SELECT
t.NAME AS TableName,
p.rows AS RowCounts,
CAST(ROUND((SUM(a.total_pages) * 8) / 1024.0, 2) AS DECIMAL(10,2)) AS TotalSpaceMB,
CAST(ROUND((SUM(a.used_pages) * 8) / 1024.0, 2) AS DECIMAL(10,2)) AS UsedSpaceMB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.NAME LIKE 'Octopus_%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
AND i.index_id <= 1
GROUP BY t.NAME, p.Rows
ORDER BY TotalSpaceMB DESC;