AI-Ready Schema Design — SQL Server Guide

Why Schema Design Matters for AI

An AI-ready schema is not just a well-normalized relational schema — it includes a set of standard columns that enable AI enrichment workflows to write classification labels, summaries, embeddings, and audit trails alongside the core business data. Designing these columns in from the start means AI features can be added to any table without breaking schema migrations later.

The Data Ocean schema standard defines two categories of columns:

Required metadata columns — present on every table; enable multi-tenancy, auditing, and soft delete
Optional AI enhancement columns — added to tables that will be AI-enriched; enable semantic search, classification, and summarization

Required Metadata Columns

Every table in a Data Ocean SQL Server database must include these columns:

Column	Type	Default	Purpose
`TenantId`	int NOT NULL	Required in INSERT	Multi-tenancy discriminator. First column in every WHERE clause.
`CreatedAt`	datetime2 NOT NULL	GETUTCDATE()	UTC creation timestamp. Set once on INSERT, never updated.
`UpdatedAt`	datetime2 NOT NULL	GETUTCDATE()	UTC last-modified timestamp. Updated by the application on every UPDATE.
`CreatedBy`	nvarchar(200) NULL	NULL	UserId or system identifier of the record creator. Null for system-generated records.
`IsDeleted`	bit NOT NULL	0	Soft delete flag. Records are never physically deleted — set `IsDeleted = 1` and `DeletedAt` instead.
`DeletedAt`	datetime2 NULL	NULL	UTC timestamp when the record was soft-deleted. Null for active records.

Optional AI Enhancement Columns

Add these columns to any table that will be processed by AI enrichment workflows:

Column	Type	Purpose
`SummaryText`	nvarchar(max) NULL	AI-generated summary of the record — written by an Octopus summarization node
`ClassificationLabel`	nvarchar(200) NULL	AI-assigned classification (e.g., "High Priority", "Churn Risk", "Positive Sentiment")
`SentimentScore`	decimal(5,4) NULL	Numeric sentiment score between -1.0 (negative) and 1.0 (positive)
`EmbeddingRef`	nvarchar(500) NULL	ID in the vector store for the embedding of this record's content — enables semantic search
`AiProcessedAt`	datetime2 NULL	UTC timestamp of the last AI enrichment run. Used to identify stale enrichment.
`AiModelVersion`	nvarchar(100) NULL	The model that produced the enrichment (e.g., "gpt-4o-2024-08-06"). Supports model rollover auditing.

Complete Example: Customer Table

CREATE TABLE Customer (
    -- Primary Key
    CustomerId      uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),

    -- Required metadata columns
    TenantId        int              NOT NULL,
    CreatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    UpdatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    CreatedBy       nvarchar(200)    NULL,
    IsDeleted       bit              NOT NULL DEFAULT 0,
    DeletedAt       datetime2        NULL,

    -- Business columns
    Name            nvarchar(300)    NOT NULL,
    Email           nvarchar(300)    NOT NULL,
    Phone           nvarchar(50)     NULL,
    Industry        nvarchar(200)    NULL,
    AnnualRevenue   decimal(18,2)    NULL,
    Status          nvarchar(50)     NOT NULL DEFAULT 'Active',

    -- AI enhancement columns
    SummaryText         nvarchar(max)    NULL,
    ClassificationLabel nvarchar(200)    NULL,
    SentimentScore      decimal(5,4)     NULL,
    EmbeddingRef        nvarchar(500)    NULL,
    AiProcessedAt       datetime2        NULL,
    AiModelVersion      nvarchar(100)    NULL,

    -- Constraints
    CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED (CustomerId),
    CONSTRAINT UQ_Customer_TenantEmail UNIQUE (TenantId, Email)
);

-- Required indexes
CREATE INDEX IX_Customer_TenantId_Status
    ON Customer (TenantId, Status)
    WHERE IsDeleted = 0;

CREATE INDEX IX_Customer_TenantId_EmbeddingRef
    ON Customer (TenantId, EmbeddingRef)
    WHERE EmbeddingRef IS NOT NULL;

CREATE INDEX IX_Customer_TenantId_AiProcessedAt
    ON Customer (TenantId, AiProcessedAt)
    WHERE AiProcessedAt IS NULL;  -- Finds un-enriched records efficiently

Naming Conventions

Element	Convention	Example
Table names	PascalCase, singular	`Customer`, `Invoice`, `SupportTicket`
Column names	PascalCase	`CustomerId`, `CreatedAt`
Primary key	`[Table]Id`	`CustomerId`, `InvoiceId`
Foreign keys	`[ReferencedTable]Id`	`CustomerId` on Invoice table
Boolean columns	`Is[State]` or `Has[Property]`	`IsDeleted`, `IsVerified`, `HasAttachments`
Timestamp columns	`[Event]At`	`CreatedAt`, `ProcessedAt`, `ExpiresAt`
JSON columns	`[Content]Json` or descriptive name	`MetadataJson`, `Config`

Always Filter on IsDeleted

All production queries must include WHERE IsDeleted = 0 (or filter in a view). Create a filtered index on (TenantId, IsDeleted) to make this efficient. Consider creating tenant-scoped views that pre-apply the soft-delete filter so that workflow SQL nodes do not need to remember to add it.

Schema Evolution with EF Core Migrations

When using EF Core Code-First to manage your Data Ocean schema, add AI enhancement columns as nullable additions — never as breaking changes:

// EF Core migration — adding AI columns to existing table
public partial class AddAiColumnsToCustomer : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.AddColumn<string>(
            name: "SummaryText",
            table: "Customer",
            type: "nvarchar(max)",
            nullable: true);

        migrationBuilder.AddColumn<string>(
            name: "ClassificationLabel",
            table: "Customer",
            type: "nvarchar(200)",
            nullable: true);

        migrationBuilder.AddColumn<string>(
            name: "EmbeddingRef",
            table: "Customer",
            type: "nvarchar(500)",
            nullable: true);

        migrationBuilder.AddColumn<DateTime>(
            name: "AiProcessedAt",
            table: "Customer",
            type: "datetime2",
            nullable: true);
    }
}

← Datasource Registration Next: Access from Flow Studio →