Compliance and PII
GDPR-aware enrichment patterns — PII detection and tagging, consent management, right-to-erasure implementation, and how the Data Ocean lineage system satisfies GDPR accountability requirements.
PII in Enrichment Workflows
When enrichment workflows process personal data (names, emails, phone numbers, addresses), they must do so in compliance with applicable data protection regulations (GDPR, CCPA, PDPA). The Data Ocean compliance pattern addresses this through three mechanisms:
- PII detection and classification — automatically identify which records contain personal data
- Consent management — track consent status and block enrichment for records where consent is not granted
- Right-to-erasure — when a deletion request arrives, purge enrichment fields from the record AND from the vector store
PII Detection Workflow
Run PII detection as part of the enrichment pipeline — or as a separate dedicated PII scan workflow:
// AI Node — PII Detection
{
"nodeType": "AIAgentNode",
"promptTemplate": "Analyze the following data record and identify if it contains Personally Identifiable Information (PII).\n\nRecord fields:\nName: {{record.FullName}}\nEmail: {{record.Email}}\nPhone: {{record.Phone}}\nNotes: {{record.Notes}}\n\nClassify the PII level and list what PII types are present.\n\nReturn JSON:\n{\n \"piiLevel\": \"None | Low | Medium | High | Sensitive\",\n \"piiTypes\": [\"Name\", \"Email\", \"Phone\", \"Address\", \"HealthInfo\", \"Financial\"],\n \"hasDirectIdentifiers\": true/false,\n \"recommendation\": \"Safe to enrich | Enrich with consent | Do not enrich without legal basis\"\n}",
"responseFormat": "json",
"outputVariable": "piiResult"
}
PII Classification Levels
| Level | Definition | Enrichment Policy |
|---|---|---|
None | No personal data detected | Enrich freely |
Low | Generic identifiers (job title, company name) | Enrich — log in lineage |
Medium | Names, email addresses, phone numbers | Enrich if ConsentStatus = Granted |
High | Financial details, location data, behavioral data | Enrich only with explicit consent and legal basis logged |
Sensitive | Health data, biometric, special categories | Do NOT enrich without specific legal review |
Consent Management
The ConsentStatus column controls whether a record's personal data may be processed by AI enrichment workflows:
-- Check consent before enrichment (add at start of enrichment workflow)
SqlQueryNode:
SELECT ConsentStatus, PiiClassification, ConsentGrantedAt
FROM Lead
WHERE TenantId = @tenantId AND LeadId = @leadId
// Condition Node:
// If PiiClassification IN ('Medium', 'High', 'Sensitive')
// AND ConsentStatus != 'Granted'
// → Exit workflow (log to EnrichmentSkipped table)
// Else
// → Proceed with enrichment
Recording Consent
// Consent granted via web form or explicit opt-in
UPDATE Lead SET
ConsentStatus = 'Granted',
ConsentGrantedAt = GETUTCDATE(),
UpdatedAt = GETUTCDATE()
WHERE TenantId = @tenantId AND LeadId = @leadId;
Right-to-Erasure Implementation
GDPR Article 17 gives individuals the right to have their personal data erased. In Data Ocean, this means:
- Soft-delete the record: Set
IsDeleted = 1 - Purge enrichment fields: Set all AI-generated columns to NULL — the enrichment was derived from personal data and must also be erased
- Delete from vector store: Remove the embedding from Qdrant using the
EmbeddingRef - Record the erasure in lineage: Insert a lineage record with
EnrichmentType = 'DataErasure'
-- Step 1 & 2: Soft-delete and purge enrichment in one atomic UPDATE
UPDATE Lead SET
IsDeleted = 1,
DeletedAt = GETUTCDATE(),
-- Purge all personal data fields
FullName = 'REDACTED',
Email = 'REDACTED@redacted.invalid',
Phone = NULL,
Notes = NULL,
-- Purge AI enrichment derived from personal data
SummaryText = NULL,
ClassificationLabel = NULL,
SentimentScore = NULL,
ExtractedEntitiesJson = NULL,
KeywordsJson = NULL,
EmbeddingRef = NULL,
AiProcessedAt = NULL,
-- Record consent withdrawal
ConsentStatus = 'Withdrawn',
ConsentWithdrawnAt = GETUTCDATE(),
UpdatedAt = GETUTCDATE()
WHERE TenantId = @tenantId AND LeadId = @leadId;
-- Step 3: Delete from vector store (via workflow node)
-- VectorDeleteNode: collection='leads', embeddingRef={{embeddingRef}}
Soft-deleting the SQL record is not sufficient. The vector store still holds the semantic embedding of the individual's personal data. The erasure workflow MUST delete the entry from Qdrant (or PGVector) using the EmbeddingRef value before setting it to NULL in SQL. Failure to delete from the vector store means personal data persists in a non-obvious location.
Compliance Reporting Queries
-- PII distribution across all leads
SELECT
ISNULL(PiiClassification, 'Not Classified') AS PiiLevel,
COUNT(*) AS RecordCount,
SUM(CASE WHEN ConsentStatus = 'Granted' THEN 1 ELSE 0 END) AS WithConsent
FROM Lead
WHERE TenantId = @tenantId AND IsDeleted = 0
GROUP BY PiiClassification
ORDER BY RecordCount DESC;
-- Records pending PII classification
SELECT COUNT(*) AS PendingPiiClassification
FROM Lead
WHERE TenantId = @tenantId
AND IsDeleted = 0
AND PiiClassification IS NULL;
-- Consent withdrawal log (last 90 days)
SELECT LeadId, ConsentWithdrawnAt, AiProcessedAt
FROM Lead
WHERE TenantId = @tenantId
AND ConsentStatus = 'Withdrawn'
AND ConsentWithdrawnAt >= DATEADD(DAY, -90, GETUTCDATE())
ORDER BY ConsentWithdrawnAt DESC;