Data Enrichment Overview
Data Ocean is not just a storage layer — it is the organization's AI-ready data hub. Enrichment workflows add intelligence to raw records: classification, summarization, semantic embeddings, and provenance tracking that accelerate every AI feature you build.
What Is Data Enrichment?
Data enrichment is the process of automatically adding AI-generated context to raw business records after they are created or updated. A raw lead record — just a name, email, and company — becomes an enriched record with a classification label ("High Value"), a summary ("Senior executive at a mid-market software company, expressed interest in enterprise plan, high purchase intent"), and a semantic embedding that enables similarity search.
Enrichment happens asynchronously in the background — it never blocks the user who created the record. A workflow fires automatically when a new record arrives, processes it through AI nodes, and writes the enriched fields back to the database. By the time a sales rep views the record, the AI has already done the analysis.
The Data Hub Concept
Data Ocean functions as the central data hub for the BizFirstGO ecosystem. All workflows — regardless of which business function they serve — read from and write to Data Ocean databases. This centralization means that enrichment done once benefits every workflow that subsequently reads that data.
Classification
AI assigns categorical labels to records — lead quality, customer segment, support ticket priority, risk level. Labels are queryable, filterable, and reportable.
Summarization
AI condenses long-form content (notes, emails, documents, transcripts) into structured summaries that give context at a glance without reading the full text.
Embeddings
AI generates vector representations of record content that enable semantic similarity search — find records that are conceptually similar, not just keyword-matched.
Sentiment Analysis
AI measures the emotional tone of text content — customer feedback, support tickets, email threads — producing a numeric score and label (positive/neutral/negative).
Entity Extraction
AI identifies and extracts named entities — people, companies, products, dates, amounts — from unstructured text and stores them as structured JSON.
PII Detection
AI identifies and tags records containing Personally Identifiable Information — enabling GDPR-compliant handling of sensitive data at scale.
Why AI Readiness Matters
An AI-ready database is one where every record in the system has been enriched with the metadata that AI workflows need. When your database is AI-ready:
- New AI features can be built in hours instead of weeks — the data is already prepared
- Semantic search works across your entire data estate without per-query embedding generation
- Classification labels enable intelligent routing and prioritization without custom ML models
- Summaries enable AI agents to reason about large record sets without processing raw data
- Compliance controls are built-in — PII tagging and data lineage are maintained automatically
Enrichment Categories
| Enrichment Type | Input | Output Column | AI Technology |
|---|---|---|---|
| Classification | Record fields as prompt context | ClassificationLabel | LLM chat completion (structured output) |
| Summarization | Notes, description, long-form text | SummaryText | LLM chat completion |
| Sentiment | Free-text content | SentimentScore | LLM or specialized sentiment model |
| Embedding | Combined text representation of record | EmbeddingRef (+ vector store) | Embedding model (text-embedding-3-large) |
| Entity Extraction | Free-text (notes, emails) | ExtractedEntitiesJson | LLM structured extraction |
| PII Detection | All text columns | PiiClassification | LLM classification |