Portal Community

The AI-Ready Database Standard

An AI-ready database in the Data Ocean ecosystem is one where every table follows a defined schema standard that enables AI workflows to operate on the data without requiring schema migrations or one-time data preparation work. The standard is divided into three tiers:

Tier 1 — Foundational

Required on every table. Enables multi-tenancy, audit trails, soft delete, and basic AI pipeline compatibility.

Tier 2 — AI Enhanced

Added to tables that will be AI-enriched. Enables classification, summarization, sentiment, and embedding-based search.

Tier 3 — Compliance

Added to tables containing personal data. Enables GDPR compliance, PII tracking, consent management, and data lineage.

Tier 1 — Foundational Columns (Required)

ColumnTypeDefaultPurpose
TenantIdint NOT NULLMulti-tenancy discriminator. Present on all tables. First column in all WHERE clauses.
CreatedAtdatetime2 NOT NULLGETUTCDATE()UTC creation timestamp. Immutable after INSERT.
UpdatedAtdatetime2 NOT NULLGETUTCDATE()UTC last-modified timestamp. Updated on every UPDATE.
CreatedBynvarchar(200) NULLNULLIdentity of the record creator (userId, system agent ID).
IsDeletedbit NOT NULL0Soft delete flag. Never physically delete rows.
DeletedAtdatetime2 NULLNULLUTC time of soft deletion.

Tier 2 — AI Enhancement Columns

ColumnTypePurposeSet By
SummaryTextnvarchar(max) NULLAI-generated 2-4 sentence summary of the recordSummarization enrichment workflow
ClassificationLabelnvarchar(200) NULLAI-assigned categorical label (domain-specific)Classification enrichment workflow
SentimentScoredecimal(5,4) NULLNumeric sentiment: -1.0 (very negative) to 1.0 (very positive)Sentiment enrichment workflow
ExtractedEntitiesJsonnvarchar(max) NULLJSON of extracted named entities (people, companies, amounts)Entity extraction workflow
KeywordsJsonnvarchar(max) NULLJSON array of extracted keywords for filtering and searchKeyword extraction workflow
EmbeddingRefnvarchar(500) NULLVector store entry ID for this record's semantic embeddingEmbedding generation workflow
AiProcessedAtdatetime2 NULLUTC timestamp of last successful AI enrichmentAll enrichment workflows (on completion)
AiModelVersionnvarchar(100) NULLModel identifier used for last enrichment (e.g., "gpt-4o-2024-08-06")All enrichment workflows

Tier 3 — Compliance Columns

ColumnTypePurpose
PiiClassificationnvarchar(50) NULLPII sensitivity level: None, Low, Medium, High, Sensitive
ConsentStatusnvarchar(50) NULLConsent state: Granted, Withdrawn, Pending, NotRequired
ConsentGrantedAtdatetime2 NULLUTC timestamp when consent was granted
ConsentWithdrawnAtdatetime2 NULLUTC timestamp when consent was withdrawn (triggers enrichment field purge)
DataLineageIduniqueidentifier NULLFK to DataLineage table — tracks the last enrichment event that modified this record

Complete AI-Ready Table Template

CREATE TABLE [YourEntity] (
    -- Primary key
    [YourEntity]Id      uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),

    -- TIER 1: Foundational
    TenantId            int              NOT NULL,
    CreatedAt           datetime2        NOT NULL DEFAULT GETUTCDATE(),
    UpdatedAt           datetime2        NOT NULL DEFAULT GETUTCDATE(),
    CreatedBy           nvarchar(200)    NULL,
    IsDeleted           bit              NOT NULL DEFAULT 0,
    DeletedAt           datetime2        NULL,

    -- YOUR BUSINESS COLUMNS HERE

    -- TIER 2: AI Enhancement
    SummaryText             nvarchar(max)    NULL,
    ClassificationLabel     nvarchar(200)    NULL,
    SentimentScore          decimal(5,4)     NULL,
    ExtractedEntitiesJson   nvarchar(max)    NULL,
    KeywordsJson            nvarchar(max)    NULL,
    EmbeddingRef            nvarchar(500)    NULL,
    AiProcessedAt           datetime2        NULL,
    AiModelVersion          nvarchar(100)    NULL,

    -- TIER 3: Compliance (add when record contains personal data)
    PiiClassification       nvarchar(50)     NULL,
    ConsentStatus           nvarchar(50)     NULL,
    ConsentGrantedAt        datetime2        NULL,
    ConsentWithdrawnAt      datetime2        NULL,
    DataLineageId           uniqueidentifier NULL,

    CONSTRAINT PK_[YourEntity] PRIMARY KEY CLUSTERED ([YourEntity]Id)
);

-- Required indexes
CREATE INDEX IX_[YourEntity]_TenantId_Status ON [YourEntity] (TenantId, [StatusColumn]) WHERE IsDeleted = 0;
CREATE INDEX IX_[YourEntity]_TenantId_AiProcessedAt ON [YourEntity] (TenantId, AiProcessedAt) WHERE AiProcessedAt IS NULL AND IsDeleted = 0;
CREATE INDEX IX_[YourEntity]_TenantId_EmbeddingRef ON [YourEntity] (TenantId, EmbeddingRef) WHERE EmbeddingRef IS NOT NULL;
AI-Ready Database Assessment

Run this query to assess how AI-ready your database is. The goal is 100% of records having non-null AiProcessedAt and EmbeddingRef for all tables that have AI enhancement columns:

SELECT
    COUNT(*) AS TotalRecords,
    SUM(CASE WHEN AiProcessedAt IS NOT NULL THEN 1 ELSE 0 END) AS Enriched,
    SUM(CASE WHEN EmbeddingRef IS NOT NULL THEN 1 ELSE 0 END) AS Embedded,
    CAST(100.0 * SUM(CASE WHEN AiProcessedAt IS NOT NULL THEN 1 ELSE 0 END) / COUNT(*) AS decimal(5,1)) AS EnrichedPct
FROM Lead WHERE TenantId = @tenantId AND IsDeleted = 0;