Portal Community

PII in Enrichment Workflows

When enrichment workflows process personal data (names, emails, phone numbers, addresses), they must do so in compliance with applicable data protection regulations (GDPR, CCPA, PDPA). The Data Ocean compliance pattern addresses this through three mechanisms:

PII Detection Workflow

Run PII detection as part of the enrichment pipeline — or as a separate dedicated PII scan workflow:

// AI Node — PII Detection
{
  "nodeType": "AIAgentNode",
  "promptTemplate": "Analyze the following data record and identify if it contains Personally Identifiable Information (PII).\n\nRecord fields:\nName: {{record.FullName}}\nEmail: {{record.Email}}\nPhone: {{record.Phone}}\nNotes: {{record.Notes}}\n\nClassify the PII level and list what PII types are present.\n\nReturn JSON:\n{\n  \"piiLevel\": \"None | Low | Medium | High | Sensitive\",\n  \"piiTypes\": [\"Name\", \"Email\", \"Phone\", \"Address\", \"HealthInfo\", \"Financial\"],\n  \"hasDirectIdentifiers\": true/false,\n  \"recommendation\": \"Safe to enrich | Enrich with consent | Do not enrich without legal basis\"\n}",
  "responseFormat": "json",
  "outputVariable": "piiResult"
}

PII Classification Levels

LevelDefinitionEnrichment Policy
NoneNo personal data detectedEnrich freely
LowGeneric identifiers (job title, company name)Enrich — log in lineage
MediumNames, email addresses, phone numbersEnrich if ConsentStatus = Granted
HighFinancial details, location data, behavioral dataEnrich only with explicit consent and legal basis logged
SensitiveHealth data, biometric, special categoriesDo NOT enrich without specific legal review

Consent Management

The ConsentStatus column controls whether a record's personal data may be processed by AI enrichment workflows:

-- Check consent before enrichment (add at start of enrichment workflow)
SqlQueryNode:
SELECT ConsentStatus, PiiClassification, ConsentGrantedAt
FROM Lead
WHERE TenantId = @tenantId AND LeadId = @leadId

// Condition Node:
// If PiiClassification IN ('Medium', 'High', 'Sensitive')
//    AND ConsentStatus != 'Granted'
//    → Exit workflow (log to EnrichmentSkipped table)
// Else
//    → Proceed with enrichment

Recording Consent

// Consent granted via web form or explicit opt-in
UPDATE Lead SET
    ConsentStatus = 'Granted',
    ConsentGrantedAt = GETUTCDATE(),
    UpdatedAt = GETUTCDATE()
WHERE TenantId = @tenantId AND LeadId = @leadId;

Right-to-Erasure Implementation

GDPR Article 17 gives individuals the right to have their personal data erased. In Data Ocean, this means:

  1. Soft-delete the record: Set IsDeleted = 1
  2. Purge enrichment fields: Set all AI-generated columns to NULL — the enrichment was derived from personal data and must also be erased
  3. Delete from vector store: Remove the embedding from Qdrant using the EmbeddingRef
  4. Record the erasure in lineage: Insert a lineage record with EnrichmentType = 'DataErasure'
-- Step 1 & 2: Soft-delete and purge enrichment in one atomic UPDATE
UPDATE Lead SET
    IsDeleted = 1,
    DeletedAt = GETUTCDATE(),
    -- Purge all personal data fields
    FullName = 'REDACTED',
    Email = 'REDACTED@redacted.invalid',
    Phone = NULL,
    Notes = NULL,
    -- Purge AI enrichment derived from personal data
    SummaryText = NULL,
    ClassificationLabel = NULL,
    SentimentScore = NULL,
    ExtractedEntitiesJson = NULL,
    KeywordsJson = NULL,
    EmbeddingRef = NULL,
    AiProcessedAt = NULL,
    -- Record consent withdrawal
    ConsentStatus = 'Withdrawn',
    ConsentWithdrawnAt = GETUTCDATE(),
    UpdatedAt = GETUTCDATE()
WHERE TenantId = @tenantId AND LeadId = @leadId;

-- Step 3: Delete from vector store (via workflow node)
-- VectorDeleteNode: collection='leads', embeddingRef={{embeddingRef}}
Erasure Must Also Cover the Vector Store

Soft-deleting the SQL record is not sufficient. The vector store still holds the semantic embedding of the individual's personal data. The erasure workflow MUST delete the entry from Qdrant (or PGVector) using the EmbeddingRef value before setting it to NULL in SQL. Failure to delete from the vector store means personal data persists in a non-obvious location.

Compliance Reporting Queries

-- PII distribution across all leads
SELECT
    ISNULL(PiiClassification, 'Not Classified') AS PiiLevel,
    COUNT(*) AS RecordCount,
    SUM(CASE WHEN ConsentStatus = 'Granted' THEN 1 ELSE 0 END) AS WithConsent
FROM Lead
WHERE TenantId = @tenantId AND IsDeleted = 0
GROUP BY PiiClassification
ORDER BY RecordCount DESC;

-- Records pending PII classification
SELECT COUNT(*) AS PendingPiiClassification
FROM Lead
WHERE TenantId = @tenantId
  AND IsDeleted = 0
  AND PiiClassification IS NULL;

-- Consent withdrawal log (last 90 days)
SELECT LeadId, ConsentWithdrawnAt, AiProcessedAt
FROM Lead
WHERE TenantId = @tenantId
  AND ConsentStatus = 'Withdrawn'
  AND ConsentWithdrawnAt >= DATEADD(DAY, -90, GETUTCDATE())
ORDER BY ConsentWithdrawnAt DESC;