Data Enrichment Overview — Data Enrichment & Enhancement

What Is Data Enrichment?

Data enrichment is the process of automatically adding AI-generated context to raw business records after they are created or updated. A raw lead record — just a name, email, and company — becomes an enriched record with a classification label ("High Value"), a summary ("Senior executive at a mid-market software company, expressed interest in enterprise plan, high purchase intent"), and a semantic embedding that enables similarity search.

Enrichment happens asynchronously in the background — it never blocks the user who created the record. A workflow fires automatically when a new record arrives, processes it through AI nodes, and writes the enriched fields back to the database. By the time a sales rep views the record, the AI has already done the analysis.

The Data Hub Concept

Data Ocean functions as the central data hub for the BizFirst ecosystem. All workflows — regardless of which business function they serve — read from and write to Data Ocean databases. This centralization means that enrichment done once benefits every workflow that subsequently reads that data.

Classification

AI assigns categorical labels to records — lead quality, customer segment, support ticket priority, risk level. Labels are queryable, filterable, and reportable.

Summarization

AI condenses long-form content (notes, emails, documents, transcripts) into structured summaries that give context at a glance without reading the full text.

Embeddings

AI generates vector representations of record content that enable semantic similarity search — find records that are conceptually similar, not just keyword-matched.

Sentiment Analysis

AI measures the emotional tone of text content — customer feedback, support tickets, email threads — producing a numeric score and label (positive/neutral/negative).

Entity Extraction

AI identifies and extracts named entities — people, companies, products, dates, amounts — from unstructured text and stores them as structured JSON.

PII Detection

AI identifies and tags records containing Personally Identifiable Information — enabling GDPR-compliant handling of sensitive data at scale.

Why AI Readiness Matters

An AI-ready database is one where every record in the system has been enriched with the metadata that AI workflows need. When your database is AI-ready:

New AI features can be built in hours instead of weeks — the data is already prepared
Semantic search works across your entire data estate without per-query embedding generation
Classification labels enable intelligent routing and prioritization without custom ML models
Summaries enable AI agents to reason about large record sets without processing raw data
Compliance controls are built-in — PII tagging and data lineage are maintained automatically

Enrichment Categories

Enrichment Type	Input	Output Column	AI Technology
Classification	Record fields as prompt context	ClassificationLabel	LLM chat completion (structured output)
Summarization	Notes, description, long-form text	SummaryText	LLM chat completion
Sentiment	Free-text content	SentimentScore	LLM or specialized sentiment model
Embedding	Combined text representation of record	EmbeddingRef (+ vector store)	Embedding model (text-embedding-3-large)
Entity Extraction	Free-text (notes, emails)	ExtractedEntitiesJson	LLM structured extraction
PII Detection	All text columns	PiiClassification	LLM classification

Next: Enrichment Pipeline Pattern →