AI-Ready Schema Design
How to design SQL Server tables in Data Ocean that are AI-ready from day one — standard metadata columns, vector-ready patterns, naming conventions, and the column set that enables AI enrichment workflows without schema changes later.
Why Schema Design Matters for AI
An AI-ready schema is not just a well-normalized relational schema — it includes a set of standard columns that enable AI enrichment workflows to write classification labels, summaries, embeddings, and audit trails alongside the core business data. Designing these columns in from the start means AI features can be added to any table without breaking schema migrations later.
The Data Ocean schema standard defines two categories of columns:
- Required metadata columns — present on every table; enable multi-tenancy, auditing, and soft delete
- Optional AI enhancement columns — added to tables that will be AI-enriched; enable semantic search, classification, and summarization
Required Metadata Columns
Every table in a Data Ocean SQL Server database must include these columns:
| Column | Type | Default | Purpose |
|---|---|---|---|
TenantId | int NOT NULL | Required in INSERT | Multi-tenancy discriminator. First column in every WHERE clause. |
CreatedAt | datetime2 NOT NULL | GETUTCDATE() | UTC creation timestamp. Set once on INSERT, never updated. |
UpdatedAt | datetime2 NOT NULL | GETUTCDATE() | UTC last-modified timestamp. Updated by the application on every UPDATE. |
CreatedBy | nvarchar(200) NULL | NULL | UserId or system identifier of the record creator. Null for system-generated records. |
IsDeleted | bit NOT NULL | 0 | Soft delete flag. Records are never physically deleted — set IsDeleted = 1 and DeletedAt instead. |
DeletedAt | datetime2 NULL | NULL | UTC timestamp when the record was soft-deleted. Null for active records. |
Optional AI Enhancement Columns
Add these columns to any table that will be processed by AI enrichment workflows:
| Column | Type | Purpose |
|---|---|---|
SummaryText | nvarchar(max) NULL | AI-generated summary of the record — written by an Octopus summarization node |
ClassificationLabel | nvarchar(200) NULL | AI-assigned classification (e.g., "High Priority", "Churn Risk", "Positive Sentiment") |
SentimentScore | decimal(5,4) NULL | Numeric sentiment score between -1.0 (negative) and 1.0 (positive) |
EmbeddingRef | nvarchar(500) NULL | ID in the vector store for the embedding of this record's content — enables semantic search |
AiProcessedAt | datetime2 NULL | UTC timestamp of the last AI enrichment run. Used to identify stale enrichment. |
AiModelVersion | nvarchar(100) NULL | The model that produced the enrichment (e.g., "gpt-4o-2024-08-06"). Supports model rollover auditing. |
Complete Example: Customer Table
CREATE TABLE Customer (
-- Primary Key
CustomerId uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),
-- Required metadata columns
TenantId int NOT NULL,
CreatedAt datetime2 NOT NULL DEFAULT GETUTCDATE(),
UpdatedAt datetime2 NOT NULL DEFAULT GETUTCDATE(),
CreatedBy nvarchar(200) NULL,
IsDeleted bit NOT NULL DEFAULT 0,
DeletedAt datetime2 NULL,
-- Business columns
Name nvarchar(300) NOT NULL,
Email nvarchar(300) NOT NULL,
Phone nvarchar(50) NULL,
Industry nvarchar(200) NULL,
AnnualRevenue decimal(18,2) NULL,
Status nvarchar(50) NOT NULL DEFAULT 'Active',
-- AI enhancement columns
SummaryText nvarchar(max) NULL,
ClassificationLabel nvarchar(200) NULL,
SentimentScore decimal(5,4) NULL,
EmbeddingRef nvarchar(500) NULL,
AiProcessedAt datetime2 NULL,
AiModelVersion nvarchar(100) NULL,
-- Constraints
CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED (CustomerId),
CONSTRAINT UQ_Customer_TenantEmail UNIQUE (TenantId, Email)
);
-- Required indexes
CREATE INDEX IX_Customer_TenantId_Status
ON Customer (TenantId, Status)
WHERE IsDeleted = 0;
CREATE INDEX IX_Customer_TenantId_EmbeddingRef
ON Customer (TenantId, EmbeddingRef)
WHERE EmbeddingRef IS NOT NULL;
CREATE INDEX IX_Customer_TenantId_AiProcessedAt
ON Customer (TenantId, AiProcessedAt)
WHERE AiProcessedAt IS NULL; -- Finds un-enriched records efficiently
Naming Conventions
| Element | Convention | Example |
|---|---|---|
| Table names | PascalCase, singular | Customer, Invoice, SupportTicket |
| Column names | PascalCase | CustomerId, CreatedAt |
| Primary key | [Table]Id | CustomerId, InvoiceId |
| Foreign keys | [ReferencedTable]Id | CustomerId on Invoice table |
| Boolean columns | Is[State] or Has[Property] | IsDeleted, IsVerified, HasAttachments |
| Timestamp columns | [Event]At | CreatedAt, ProcessedAt, ExpiresAt |
| JSON columns | [Content]Json or descriptive name | MetadataJson, Config |
All production queries must include WHERE IsDeleted = 0 (or filter in a view). Create a filtered index on (TenantId, IsDeleted) to make this efficient. Consider creating tenant-scoped views that pre-apply the soft-delete filter so that workflow SQL nodes do not need to remember to add it.
Schema Evolution with EF Core Migrations
When using EF Core Code-First to manage your Data Ocean schema, add AI enhancement columns as nullable additions — never as breaking changes:
// EF Core migration — adding AI columns to existing table
public partial class AddAiColumnsToCustomer : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.AddColumn<string>(
name: "SummaryText",
table: "Customer",
type: "nvarchar(max)",
nullable: true);
migrationBuilder.AddColumn<string>(
name: "ClassificationLabel",
table: "Customer",
type: "nvarchar(200)",
nullable: true);
migrationBuilder.AddColumn<string>(
name: "EmbeddingRef",
table: "Customer",
type: "nvarchar(500)",
nullable: true);
migrationBuilder.AddColumn<DateTime>(
name: "AiProcessedAt",
table: "Customer",
type: "datetime2",
nullable: true);
}
}