Designing the Data Schema — Application Data

Step 1 — Identify the Business Entity

Start with the question: what real-world thing are we tracking? Each entity becomes a table. Common patterns:

People: Customer, Employee, Contact, Applicant
Events: Order, Invoice, Appointment, SupportTicket
Assets: Product, InventoryItem, Document, Vehicle
Relationships: CustomerOrder, EmployeeProject (junction tables)

For a lead management application, the primary entity is Lead. A lead represents a potential customer who has expressed interest but has not yet been qualified.

Step 2 — List the Business Attributes

For each entity, list the attributes that need to be captured. Focus on what the business actually needs, not what the technology allows:

-- Lead entity — business attributes identified with stakeholders:
-- Full name of the lead
-- Company name
-- Email address (required, unique per company)
-- Phone number
-- Source of the lead (web form, referral, conference, cold outreach)
-- Lead status (New, Contacted, Qualified, Disqualified, Converted)
-- Assigned sales representative (user ID)
-- Estimated deal value
-- Notes (free text)
-- Follow-up date

Step 3 — Design the SQL Schema

Translate the business attributes into SQL columns with correct types, constraints, and the required Data Ocean metadata columns:

CREATE TABLE Lead (
    -- Primary key
    LeadId          uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),

    -- ── Required Data Ocean metadata columns ──
    TenantId        int              NOT NULL,
    CreatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    UpdatedAt       datetime2        NOT NULL DEFAULT GETUTCDATE(),
    CreatedBy       nvarchar(200)    NULL,
    IsDeleted       bit              NOT NULL DEFAULT 0,
    DeletedAt       datetime2        NULL,

    -- ── Business columns ──
    FullName        nvarchar(300)    NOT NULL,
    CompanyName     nvarchar(300)    NULL,
    Email           nvarchar(300)    NOT NULL,
    Phone           nvarchar(50)     NULL,
    Source          nvarchar(100)    NOT NULL DEFAULT 'Unknown',
    Status          nvarchar(50)     NOT NULL DEFAULT 'New',
    AssignedToUserId nvarchar(200)   NULL,
    EstimatedValue  decimal(18,2)    NULL,
    Notes           nvarchar(max)    NULL,
    FollowUpDate    date             NULL,

    -- ── Optional AI enhancement columns ──
    SummaryText         nvarchar(max)    NULL,
    ClassificationLabel nvarchar(200)    NULL,   -- e.g., "High Value", "Unlikely to Convert"
    SentimentScore      decimal(5,4)     NULL,
    EmbeddingRef        nvarchar(500)    NULL,
    AiProcessedAt       datetime2        NULL,
    AiModelVersion      nvarchar(100)    NULL,

    -- Constraints
    CONSTRAINT PK_Lead PRIMARY KEY CLUSTERED (LeadId)
);

-- ── Indexes ──
CREATE INDEX IX_Lead_TenantId_Status
    ON Lead (TenantId, Status, UpdatedAt DESC)
    INCLUDE (FullName, Email, AssignedToUserId, EstimatedValue)
    WHERE IsDeleted = 0;

CREATE INDEX IX_Lead_TenantId_AssignedTo
    ON Lead (TenantId, AssignedToUserId, Status)
    WHERE IsDeleted = 0;

CREATE INDEX IX_Lead_TenantId_FollowUpDate
    ON Lead (TenantId, FollowUpDate)
    WHERE IsDeleted = 0 AND FollowUpDate IS NOT NULL;

CREATE INDEX IX_Lead_TenantId_AiProcessedAt
    ON Lead (TenantId, AiProcessedAt)
    WHERE AiProcessedAt IS NULL AND IsDeleted = 0;

Schema Design Decisions

Decision	Choice Made	Rationale
Primary key type	uniqueidentifier (GUID)	GUIDs allow distributed ID generation — records created offline or in batch can be merged without collision
Primary key default	NEWSEQUENTIALID()	Sequential GUIDs avoid clustered index fragmentation (vs. random NEWID())
Soft delete	IsDeleted + DeletedAt	Data Ocean convention — never physically delete records for audit trail
Status column type	nvarchar(50) with default	String status is more readable in SQL than integer status codes
Notes column type	nvarchar(max)	Notes are unbounded in length; nvarchar(max) handles this
FollowUpDate type	date (not datetime2)	Follow-up dates are day-granularity business data — storing time is unnecessary and confusing

Add AI Columns From Day One

Even if you do not plan to use AI enrichment immediately, add the AI enhancement columns (SummaryText, ClassificationLabel, EmbeddingRef, AiProcessedAt) to every table from the start. Adding them later requires a schema migration and a backfill job. They are nullable, so they have zero impact on current functionality while keeping the door open for AI features.

Reference Data Tables

For columns like Source and Status that have a controlled set of valid values, create a companion reference data table rather than hardcoding values in application logic:

CREATE TABLE LeadStatus (
    LeadStatusId    int              NOT NULL IDENTITY(1,1),
    TenantId        int              NOT NULL,
    Code            nvarchar(50)     NOT NULL,
    Label           nvarchar(200)    NOT NULL,
    SortOrder       int              NOT NULL DEFAULT 0,
    IsActive        bit              NOT NULL DEFAULT 1,
    CONSTRAINT PK_LeadStatus PRIMARY KEY (LeadStatusId),
    CONSTRAINT UQ_LeadStatus_TenantCode UNIQUE (TenantId, Code)
);

-- Default values (seed via migration)
INSERT INTO LeadStatus (TenantId, Code, Label, SortOrder)
VALUES
    (0, 'New', 'New Lead', 1),           -- TenantId 0 = system defaults
    (0, 'Contacted', 'Contacted', 2),
    (0, 'Qualified', 'Qualified', 3),
    (0, 'Disqualified', 'Disqualified', 4),
    (0, 'Converted', 'Converted to Customer', 5);

← Overview Next: AI-Generated Forms →