Building an AI-Ready Database
The Data Ocean AI-ready database standard — the schema conventions, column set, and structural patterns that make a database fully prepared for AI enrichment, semantic search, and AI agent consumption.
The AI-Ready Database Standard
An AI-ready database in the Data Ocean ecosystem is one where every table follows a defined schema standard that enables AI workflows to operate on the data without requiring schema migrations or one-time data preparation work. The standard is divided into three tiers:
Tier 1 — Foundational
Required on every table. Enables multi-tenancy, audit trails, soft delete, and basic AI pipeline compatibility.
Tier 2 — AI Enhanced
Added to tables that will be AI-enriched. Enables classification, summarization, sentiment, and embedding-based search.
Tier 3 — Compliance
Added to tables containing personal data. Enables GDPR compliance, PII tracking, consent management, and data lineage.
Tier 1 — Foundational Columns (Required)
| Column | Type | Default | Purpose |
|---|---|---|---|
TenantId | int NOT NULL | — | Multi-tenancy discriminator. Present on all tables. First column in all WHERE clauses. |
CreatedAt | datetime2 NOT NULL | GETUTCDATE() | UTC creation timestamp. Immutable after INSERT. |
UpdatedAt | datetime2 NOT NULL | GETUTCDATE() | UTC last-modified timestamp. Updated on every UPDATE. |
CreatedBy | nvarchar(200) NULL | NULL | Identity of the record creator (userId, system agent ID). |
IsDeleted | bit NOT NULL | 0 | Soft delete flag. Never physically delete rows. |
DeletedAt | datetime2 NULL | NULL | UTC time of soft deletion. |
Tier 2 — AI Enhancement Columns
| Column | Type | Purpose | Set By |
|---|---|---|---|
SummaryText | nvarchar(max) NULL | AI-generated 2-4 sentence summary of the record | Summarization enrichment workflow |
ClassificationLabel | nvarchar(200) NULL | AI-assigned categorical label (domain-specific) | Classification enrichment workflow |
SentimentScore | decimal(5,4) NULL | Numeric sentiment: -1.0 (very negative) to 1.0 (very positive) | Sentiment enrichment workflow |
ExtractedEntitiesJson | nvarchar(max) NULL | JSON of extracted named entities (people, companies, amounts) | Entity extraction workflow |
KeywordsJson | nvarchar(max) NULL | JSON array of extracted keywords for filtering and search | Keyword extraction workflow |
EmbeddingRef | nvarchar(500) NULL | Vector store entry ID for this record's semantic embedding | Embedding generation workflow |
AiProcessedAt | datetime2 NULL | UTC timestamp of last successful AI enrichment | All enrichment workflows (on completion) |
AiModelVersion | nvarchar(100) NULL | Model identifier used for last enrichment (e.g., "gpt-4o-2024-08-06") | All enrichment workflows |
Tier 3 — Compliance Columns
| Column | Type | Purpose |
|---|---|---|
PiiClassification | nvarchar(50) NULL | PII sensitivity level: None, Low, Medium, High, Sensitive |
ConsentStatus | nvarchar(50) NULL | Consent state: Granted, Withdrawn, Pending, NotRequired |
ConsentGrantedAt | datetime2 NULL | UTC timestamp when consent was granted |
ConsentWithdrawnAt | datetime2 NULL | UTC timestamp when consent was withdrawn (triggers enrichment field purge) |
DataLineageId | uniqueidentifier NULL | FK to DataLineage table — tracks the last enrichment event that modified this record |
Complete AI-Ready Table Template
CREATE TABLE [YourEntity] (
-- Primary key
[YourEntity]Id uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),
-- TIER 1: Foundational
TenantId int NOT NULL,
CreatedAt datetime2 NOT NULL DEFAULT GETUTCDATE(),
UpdatedAt datetime2 NOT NULL DEFAULT GETUTCDATE(),
CreatedBy nvarchar(200) NULL,
IsDeleted bit NOT NULL DEFAULT 0,
DeletedAt datetime2 NULL,
-- YOUR BUSINESS COLUMNS HERE
-- TIER 2: AI Enhancement
SummaryText nvarchar(max) NULL,
ClassificationLabel nvarchar(200) NULL,
SentimentScore decimal(5,4) NULL,
ExtractedEntitiesJson nvarchar(max) NULL,
KeywordsJson nvarchar(max) NULL,
EmbeddingRef nvarchar(500) NULL,
AiProcessedAt datetime2 NULL,
AiModelVersion nvarchar(100) NULL,
-- TIER 3: Compliance (add when record contains personal data)
PiiClassification nvarchar(50) NULL,
ConsentStatus nvarchar(50) NULL,
ConsentGrantedAt datetime2 NULL,
ConsentWithdrawnAt datetime2 NULL,
DataLineageId uniqueidentifier NULL,
CONSTRAINT PK_[YourEntity] PRIMARY KEY CLUSTERED ([YourEntity]Id)
);
-- Required indexes
CREATE INDEX IX_[YourEntity]_TenantId_Status ON [YourEntity] (TenantId, [StatusColumn]) WHERE IsDeleted = 0;
CREATE INDEX IX_[YourEntity]_TenantId_AiProcessedAt ON [YourEntity] (TenantId, AiProcessedAt) WHERE AiProcessedAt IS NULL AND IsDeleted = 0;
CREATE INDEX IX_[YourEntity]_TenantId_EmbeddingRef ON [YourEntity] (TenantId, EmbeddingRef) WHERE EmbeddingRef IS NOT NULL;
Run this query to assess how AI-ready your database is. The goal is 100% of records having non-null AiProcessedAt and EmbeddingRef for all tables that have AI enhancement columns:
SELECT
COUNT(*) AS TotalRecords,
SUM(CASE WHEN AiProcessedAt IS NOT NULL THEN 1 ELSE 0 END) AS Enriched,
SUM(CASE WHEN EmbeddingRef IS NOT NULL THEN 1 ELSE 0 END) AS Embedded,
CAST(100.0 * SUM(CASE WHEN AiProcessedAt IS NOT NULL THEN 1 ELSE 0 END) / COUNT(*) AS decimal(5,1)) AS EnrichedPct
FROM Lead WHERE TenantId = @tenantId AND IsDeleted = 0;