03 · DATA CATALOG & DATA PRODUCT HUB
Data Catalog & Data Product Hub
Catalog engine enhancements and Unstructured improvements ship Q2. Q3 introduces the Data Product Hub — a fundamental reimagining of dataZense where data is no longer catalogued as a technical asset, but published, owned, and consumed as a governed product.
Q2
Catalog Engine — Enhancements & Unstructured Improvements
Profiling Engine
Structured ProfilerMetadata engine + sampling engine for all RDBMS, cloud, and big data sources
Unstructured Profiler — OCR EngineReads text content, tables, form-based data, and images from documents — converts to structured output
Form-Based EngineIdentifies and extracts structured fields from semi-structured forms and templates
Extended Format Support Q2Extends extraction to additional formats beyond existing DOCX and XLSX support — adding PPTX, HTML, and further document types with no additional configuration required
Table Extraction with Structure Preservation Q2Tables within documents become governed rows and columns — not flat text — enabling direct downstream use
Extraction Confidence Scoring Q2Each extracted field carries an AI confidence score — low-confidence items automatically flagged for steward review
Bulk Data Object Profiling Q2Batch-profile multiple data objects in a single operation — significantly reduces cataloguing time at scale
Catalog Engine
Apache Solr Search EngineFull-text search across all catalogued metadata and indexed source data — unified discovery
Automated Tagging & Glossary GenerationAI auto-tags data objects and generates business glossary entries — no manual labelling
Data Lineage AutomationEnd-to-end lineage from source endpoint through transformation to target — forward and backward propagation
Natural Language Metadata Query"How many records loaded from Oracle in the last 3 days?" — answered instantly, no SQL required
Governance & Protection
PII Tag EngineAutomated identification, classification, and rectification of personally identifiable information
Data Protection EngineRow-level and column-level security — role-based access for data engineers, stewards, and analysts
Data CitizenshipBusiness users can view, comment, and approve catalogued data — governance with business ownership
Schema Drift Monitor Q4Detects and alerts on schema changes in source systems before pipelines are impacted
Q3
Data Product Hub — New Capability · dataZense reimagined
dataZense is not being updated — it is being reframed. Data is no longer a catalogued asset. It becomes a governed product: published with an owner, a quality score, and an SLA — discoverable and consumable by any persona across the platform, and surfaced contextually in every ChainSys solution via Smartlets.
Publishing & Ownership
Data Product PublishingEngineers publish datasets as governed products — metadata, owner, SLA, quality score, and access policy defined at publish time
Ownership ModelEvery product has a named owner accountable for freshness, quality, and access — not just a catalog entry with no responsibility attached
Product VersioningProducts are versioned — consumers can pin to a version or auto-follow latest. Breaking changes are signalled, not silent
Automated Quality ScoreQuality score calculated on every pipeline run and published on the product — consumers always see current health at a glance
Persona-Aware Discovery
Role-Based Product ViewsEngineer sees schema + lineage · Steward sees quality + ownership · Business user sees description + usage — same product, right view per role
NL Product Discovery"Find approved financial datasets with PII removed and freshness within 24 hours" — natural language search across the product catalogue
Browsable Product CatalogueDedicated product catalogue with categories, tags, ownership, quality indicators, and usage metrics — purpose-built for consumers, not data engineers
Subscription & NotificationSubscribe to any product — notified on quality changes, schema updates, SLA breaches, or ownership changes
Smartlets & Cross-Solution
SmartletsContextual data product cards surfaced within Migration, MDM, Analytics, and App Builder — see the product linked to any job, hub, or dataset without leaving the solution
Cross-Solution Product LinkingMigration jobs, master data hubs, and analytical datasets are linked to their source products — full traceability from raw source to published product
Contextual Quality AlertsWhen a consumed product drops below its quality threshold, the alert surfaces inside every solution consuming it — not just in the catalog
Product Lineage ViewTrace from raw source data through all transformations and pipelines to the final published product — one view, end to end
AI throughout: Auto-tagging, PII classification, extraction confidence scoring, NL product discovery, and quality narration are all routed through the governed AI Gateway — audited, rate-limited, and tenant-scoped.
FLOW
From Raw Data to Governed Data Product
📥
Ingest
Documents, databases, and cloud sources ingested via 200+ connectors across all formats
🔍
Profile
OCR, form parser, structured profiler, and LLM extraction — with confidence scoring on every field
🏷
Catalogue
AI auto-tags, classifies PII, generates glossary, registers in Apache Solr — fully indexed and searchable
🏛
Govern
Lineage recorded, access controls applied, steward workflows triggered, audit trail maintained
📦
Publish Q3
Published as a governed data product — owned, versioned, quality-scored, and discoverable by every persona