Portal Community

Why Schema Design Matters for AI

An AI-ready schema is not just a well-normalized relational schema — it includes a set of standard columns that enable AI enrichment workflows to write classification labels, summaries, embeddings, and audit trails alongside the core business data. Designing these columns in from the start means AI features can be added to any table without breaking schema migrations later.

The Data Ocean schema standard defines two categories of columns:

Required Metadata Columns

Every table in a Data Ocean SQL Server database must include these columns:

ColumnTypeDefaultPurpose
TenantIdint NOT NULLRequired in INSERTMulti-tenancy discriminator. First column in every WHERE clause.
CreatedAtdatetime2 NOT NULLGETUTCDATE()UTC creation timestamp. Set once on INSERT, never updated.
UpdatedAtdatetime2 NOT NULLGETUTCDATE()UTC last-modified timestamp. Updated by the application on every UPDATE.
CreatedBynvarchar(200) NULLNULLUserId or system identifier of the record creator. Null for system-generated records.
IsDeletedbit NOT NULL0Soft delete flag. Records are never physically deleted — set IsDeleted = 1 and DeletedAt instead.
DeletedAtdatetime2 NULLNULLUTC timestamp when the record was soft-deleted. Null for active records.

Optional AI Enhancement Columns

Add these columns to any table that will be processed by AI enrichment workflows:

ColumnTypePurpose
SummaryTextnvarchar(max) NULLAI-generated summary of the record — written by an Octopus summarization node
ClassificationLabelnvarchar(200) NULLAI-assigned classification (e.g., "High Priority", "Churn Risk", "Positive Sentiment")
SentimentScoredecimal(5,4) NULLNumeric sentiment score between -1.0 (negative) and 1.0 (positive)
EmbeddingRefnvarchar(500) NULLID in the vector store for the embedding of this record's content — enables semantic search
AiProcessedAtdatetime2 NULLUTC timestamp of the last AI enrichment run. Used to identify stale enrichment.
AiModelVersionnvarchar(100) NULLThe model that produced the enrichment (e.g., "gpt-4o-2024-08-06"). Supports model rollover auditing.

Complete Example: Customer Table

CREATE TABLE Customer (
    -- Primary Key
    CustomerId      uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),

    -- Required metadata columns
    TenantId        int              NOT NULL,
    CreatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    UpdatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    CreatedBy       nvarchar(200)    NULL,
    IsDeleted       bit              NOT NULL DEFAULT 0,
    DeletedAt       datetime2        NULL,

    -- Business columns
    Name            nvarchar(300)    NOT NULL,
    Email           nvarchar(300)    NOT NULL,
    Phone           nvarchar(50)     NULL,
    Industry        nvarchar(200)    NULL,
    AnnualRevenue   decimal(18,2)    NULL,
    Status          nvarchar(50)     NOT NULL DEFAULT 'Active',

    -- AI enhancement columns
    SummaryText         nvarchar(max)    NULL,
    ClassificationLabel nvarchar(200)    NULL,
    SentimentScore      decimal(5,4)     NULL,
    EmbeddingRef        nvarchar(500)    NULL,
    AiProcessedAt       datetime2        NULL,
    AiModelVersion      nvarchar(100)    NULL,

    -- Constraints
    CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED (CustomerId),
    CONSTRAINT UQ_Customer_TenantEmail UNIQUE (TenantId, Email)
);

-- Required indexes
CREATE INDEX IX_Customer_TenantId_Status
    ON Customer (TenantId, Status)
    WHERE IsDeleted = 0;

CREATE INDEX IX_Customer_TenantId_EmbeddingRef
    ON Customer (TenantId, EmbeddingRef)
    WHERE EmbeddingRef IS NOT NULL;

CREATE INDEX IX_Customer_TenantId_AiProcessedAt
    ON Customer (TenantId, AiProcessedAt)
    WHERE AiProcessedAt IS NULL;  -- Finds un-enriched records efficiently

Naming Conventions

ElementConventionExample
Table namesPascalCase, singularCustomer, Invoice, SupportTicket
Column namesPascalCaseCustomerId, CreatedAt
Primary key[Table]IdCustomerId, InvoiceId
Foreign keys[ReferencedTable]IdCustomerId on Invoice table
Boolean columnsIs[State] or Has[Property]IsDeleted, IsVerified, HasAttachments
Timestamp columns[Event]AtCreatedAt, ProcessedAt, ExpiresAt
JSON columns[Content]Json or descriptive nameMetadataJson, Config
Always Filter on IsDeleted

All production queries must include WHERE IsDeleted = 0 (or filter in a view). Create a filtered index on (TenantId, IsDeleted) to make this efficient. Consider creating tenant-scoped views that pre-apply the soft-delete filter so that workflow SQL nodes do not need to remember to add it.

Schema Evolution with EF Core Migrations

When using EF Core Code-First to manage your Data Ocean schema, add AI enhancement columns as nullable additions — never as breaking changes:

// EF Core migration — adding AI columns to existing table
public partial class AddAiColumnsToCustomer : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.AddColumn<string>(
            name: "SummaryText",
            table: "Customer",
            type: "nvarchar(max)",
            nullable: true);

        migrationBuilder.AddColumn<string>(
            name: "ClassificationLabel",
            table: "Customer",
            type: "nvarchar(200)",
            nullable: true);

        migrationBuilder.AddColumn<string>(
            name: "EmbeddingRef",
            table: "Customer",
            type: "nvarchar(500)",
            nullable: true);

        migrationBuilder.AddColumn<DateTime>(
            name: "AiProcessedAt",
            table: "Customer",
            type: "datetime2",
            nullable: true);
    }
}