Architecture Review Pack

Chainsys Smart Data Platform

Technical architecture reference covering system topology, service architecture, AI Gateway, deployment models, scalability, technology stack, and platform modernization.

Version 6.0.1 · March 2026 · Architecture & Deployment Teams

Section 1

Platform Overview

The Chainsys Smart Data Platform is a modular enterprise data management suite covering data movement, quality, cataloguing, analytics, and application development. Eleven solution domains are composed across six service modules.

Service Modules

Service Module	Product	Core Responsibility	Platform Family
Data Migration & Integration	dataZap	ETL, CDC, batch and real-time data movement, 50+ endpoint connectors	Smart Data Platform
Master Data & Quality	dataZen	MDM, DQM, Data Governance, golden record management	Smart Data Platform
Data Catalog & Governance	dataZense	Data discovery, lineage, PII tagging, business glossary	Smart Data Platform
Data Analytics	dataZense	OLAP, ML/NLP, dashboards, R runtimes	Smart Data Platform
Process & Test Automation	Smart BOTS	Record-and-playback test and business process automation	Smart Business Platform
Application Builder	Smart App Builder	Low-code web/mobile apps, agentic AI workflows (SAB Autonomous)	Smart Business Platform

Solutioning Model — 11 Solutions across 6 Modules

Each solution domain is fulfilled through a composition of service modules. ● Primary — leads delivery. ◎ Supporting — contributes significant capability.

Solution	dataZap	dataZen	dataZense Catalog	dataZense Analytics	Smart App Builder	Smart BOTS	Foundation
Data Assessment	◎		●	◎			◎
Data Quality & Governance	◎	●		◎			◎
Data Migration	●	●	◎	◎		◎	◎
Data Archival	●			◎	●		◎
Data Integration	●			◎			◎
Data Catalog	◎		●	◎			◎
Enterprise Data Management (MDM)	●	●	●	●			◎
Data Analytics	◎			●			◎
Rapid Application Development	◎				●		◎
BPA & Test Automation	◎					●	◎
Data Security	◎	◎	●	◎			◎

📋

Companion Document

Product & Solutions Overview

Definitions, target customers, key differentiators, and module mapping for all 11 ChainSys solution domains.

→

Design Principles

Principle	Description
Modularity	Distinct layers and independently deployable modules. Applications share Foundation services without duplicating them.
Scalability	Horizontal scaling via HAProxy at every tier. Stateless application services support cluster expansion without downtime.
Zero Trust Security	Authentication and authorization enforced at every layer. JWT tokens validated on every request. No implicit trust between internal services.
Flexibility	Spring Boot (transitioning to Quarkus) auto-configuration supports rapid adaptation and independent service deployment.
Maintainability	Clean API contracts, standardised versioning, and centralized governance via the AI Gateway ensure long-term platform maintainability.

Section 2

System Architecture

Full system topology from internet-facing endpoints through the DMZ, web and application tiers, to data stores and observability. Each layer enforces security boundaries separating the public internet from internal platform services.

Chainsys Platform — Network Architecturev6 r5 · 1400×640

Five-Tier Architecture

Tier	Components	Purpose
Internet / External Endpoints	Web browsers, REST/SOAP clients, mobile apps, enterprise source systems	External traffic origination and source system connectivity
DMZ	Apache HTTPD (reverse proxy), Keycloak (IdP), Apache ActiveMQ (messaging)	SSL termination, authentication, controlled entry — no direct backend access from internet
Web Tier	Apache Tomcat cluster, HAProxy load balancer	HTTP request handling, session management, horizontal scale-out
Application Tier	dataZap, dataZen, dataZense, Smart BOTS, Smart App Builder, AI Gateway (Quarkus)	Business logic, data processing, orchestration, AI function execution
Data Tier	PostgreSQL (metadata/datamart), Apache CouchDB (app store), Apache Solr (index), Redis (cache), Git/SVN (versioning)	Persistent storage, full-text search, session cache, source version control

Foundation Layer

Component	Technology	Role
Keycloak (IdP)	Keycloak — latest stable	Central identity provider. SAML 2.0, OAuth 2.0, OIDC, LDAP, Kerberos, Active Directory, JWT, MFA. Dedicated realm per tenant.
User Management	Platform-native	RBAC model: Users → Roles → Responsibilities → Permissions (Read/Write/Edit/Delete/Share/Approve/Execute).
Base Components	Platform-native	Workflow Engine, Scheduler Engine, Collaborate Engine (notifications + chat), Logging Engine, Versioning & Export-Import Engine.
API Gateway	Platform-native	REST + SOAP API management, versioning, monitoring, and self-service API catalog.

Observability target (v6.1): OpenTelemetry-based distributed tracing across all services — end-to-end request correlation from API Gateway through dataZap → dataZen → dataZense. Current release uses structured correlation IDs at the service layer with centralised log aggregation.

Section 3

Service Architecture

Each service follows a consistent horizontal layout: data sources and endpoint connectors on the left, processing engines in the centre, data stores on the right. The AI Gateway (amber, bottom-left) connects to LLM Providers (amber, bottom-right). The Infrastructure bar (gray, bottom) shows the runtime stack per service.

3.1 dataZap — Data Migration & Integration

dataZap is the platform's data movement engine, responsible for all extract, transform, and load operations. It connects to 50+ endpoint types via JDBC, REST, SOAP, OData, SAP JCo, FTP, and native connectors. Pipelines are composed visually and executed via a distributed execution controller supporting real-time (CDC), batch, and scheduled modes.

dataZap — Migration & Integration Servicev7 · 1100×620

Adapter / Controller	Engines	Capability
Extract Adapter	Endpoint Connector, Data Object Engine, CDC Engine, Filter Engine, Crypto Engine	Source-system connectivity; change data capture; encryption at extraction
Dataflow Adapter (Active)	Normalizer, Joiner, Router, Sorter, Aggregator, Mapper, Comparator, Unifier	Rule-based transformation; joins and aggregations; routing logic
Dataflow Adapter (Passive)	Validation Engine, Reprocessing Engine	Data quality gates; failed-record reprocessing queues
Load Adapter	Ingestion Engine, Validation Engine, Reconciliation Engine	Target loading with pre/post hooks; full reconciliation and audit
Execution Controller	Migration Flow, Process Flow, Data Exchange, Scheduler, Versioning	Pipeline orchestration; dependency management; scheduling; pipeline version control

Endpoint Category	Examples
Relational Databases	PostgreSQL, Oracle, SQL Server, MySQL, DB2, SAP HANA, Sybase
Enterprise Applications	SAP ECC/S4HANA, Oracle EBS/JDE/PeopleSoft, Microsoft Dynamics, IBM Maximo
Cloud Applications	Oracle ERP Cloud, Salesforce, Workday, SAP SuccessFactors, MS Dynamics 365, Concur
Big Data	Hive, Snowflake, Amazon Redshift, HBase
NoSQL & Storage	MongoDB, Apache Solr, CouchDB, OneDrive, Box, FTP
Message Brokers	IBM MQ, Apache ActiveMQ

AI Gateway integration: field mapping recommendations, transformation logic generation, anomaly detection in data pipelines, and reconciliation insight narration.

3.2 dataZen — Master Data & Quality

dataZen provides MDM and DQM capabilities built on top of dataZap for endpoint connectivity. It maintains authoritative golden records across three hub stores — Request Hub, Quality Hub, and Master Data Hub — with a full governance and approval workflow layer.

dataZen — Master Data & Quality Servicev5 · 1100×620

Engine	Function
Integration Engine	dataZap handler (inbound/mapping/outbound), scheduling handler, API publisher handler. Routes inbound data through quality checks before writing to master stores.
Data Quality Engine	Rule/Profiling, Cleansing, Harmonization, Standardization engines. Applies configurable quality rules; surfaces failing records for remediation.
Data Governance Engine	Process Flow, Validation, and Approval engines. Enforces data stewardship workflows with full audit trail.
Master Hub Engine	Hub Design, Layout, Domain Template, Augmentation, and Reporting engines. Maintains the golden record across Request Hub, Quality Hub, and Master Data Hub.

AI Gateway integration: AI-assisted deduplication and entity resolution, enrichment suggestions, merge recommendations, and data quality rule generation from profile statistics.

3.3 dataZense — Data Catalog & Governance

dataZense Catalog provides enterprise data discovery, lineage, PII classification, business glossary management, and data governance workflows. Built on Apache Solr for full-text search across metadata assets, with structured and unstructured profiling engines feeding a centralised catalog store.

dataZense — Data Catalog & Governancev4 · 1100×620

Capability	Description
Structured Profiler	Metadata capture, column statistics, sampling, and relationship discovery across relational data stores.
Unstructured Profiler	OCR-based document scanning and form-based extraction for PDFs, images, and scanned documents.
Catalog Engine (Solr)	Full-text search, data registration, business glossary, PII tag engine, data lineage engine, data protection engine.
Data Governance Workflow	Data citizen access requests, ownership assignments, steward assignment, governance approval flows.
Data Lineage	End-to-end lineage from source endpoint through transformation into target, stored and queryable via the catalog.

AI Gateway integration: metadata auto-tagging from profile statistics, PII classification enhancement, business glossary term suggestion, and lineage description narration.

3.4 dataZense — Data Analytics

dataZense Analytics provides an end-to-end analytical processing pipeline from raw data through OLAP and machine learning to visualised dashboards. The Learning Engine supports supervised, unsupervised, and reinforcement learning workloads with NLP capabilities, backed by R analytical runtimes.

dataZense — Data Analytics Servicev3 · 1100×620

Engine	Capability
Foundation Engine	Dataset management, data access layer, real-time data streaming for live dashboards.
Analytics Engine	OLAP Cube, Dimension management, Query Engine for ad-hoc analytical queries.
Learning Engine	Supervised learning (classification, regression), Unsupervised (clustering), Reinforcement Learning, NLP — backed by R and Python runtimes.
Visualization	View Engine, Dashboard Engine, Snapshot Engine, Formatting Engine, Report Scheduler — powered by Dimple.js and R ggplot2.
Query API	REST API for programmatic access to analytical results and embedding in external applications.

AI Gateway integration: insight narration, anomaly explanation, forecast commentary, and KPI narration for dashboard widgets.

3.5 Smart App Builder — Application Development & Agentic Workflows

Smart App Builder (SAB) is a low-code platform for building data-backed web and mobile applications via a visual design studio. SAB Autonomous extends the platform with multi-agent orchestration for complex enterprise tasks driven by natural language instructions.

Smart App Builder — Application Builder & SAB Autonomousv3 · 1100×620

Module	Capability
Design Studio	Visual drag-and-drop object, layout, process, and integration design backed by dataZap connectors.
Web Build Engine	Angular + Platform Component Engine. Generates deployable Angular applications from visual model definitions.
Mobile Build Engine	Ionic v4 + Platform Component Engine. Cross-platform mobile applications from the same visual model definitions.
SAB Autonomous Runtime	Agentic workflow canvas (design-time) + Execution Engine (runtime). Agents assigned roles, tools, and goals; orchestrated via the AI Gateway. Supports multi-agent collaboration, memory management, human-in-the-loop approvals, and full audit trails.
Runtime	Node.js deployment server for web applications. CouchDB for app data storage. dataZap connectors for backend data access.

Node.js Runtime: Current runtime is Node.js 12.16 (end-of-life). Upgrade to Node.js 22 LTS is in the platform modernization pipeline — see Section 8.

Section 4

AI Gateway

The Chainsys AI Gateway is the centralised control plane for all LLM interactions across the platform. Every AI call — from dataZen quality rules, dataZap field mapping, dataZense catalog tagging, or SAB Autonomous agent orchestration — routes through the gateway for governance, dispatch, and audit.

Architectural Role

Built on Quarkus (backend) and ReactJS 22 (management console), the AI Gateway is the first Chainsys component on the next-generation microservices stack. It brokers between platform services and external or self-hosted LLM providers, enforcing governance at the process level through versioned System Prompts.

AI Capability Category	Description	Examples
Deterministic Agents	Rule-based automation within service engines. No LLM calls. Fast, predictable, auditable.	CDC change detection, validation rule evaluation, reconciliation checking, field type inference
Generative AI (via Gateway)	LLM-backed functions via named Process definitions. Governed by System Prompts. All calls logged.	Field mapping suggestions, deduplication scoring, metadata auto-tagging, insight narration, SAB Autonomous agent reasoning

Core Constructs

Construct	Purpose
LLM Providers	Registered AI model providers: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Google Vertex AI, Ollama (self-hosted).
Models	Individual LLM models under providers with model ID, context window, capability flags (text/embeddings/vision/function-calling), and rate limits.
Functions & Processes	AI capabilities exposed via named Function → Process abstractions. Model assignment is centrally managed per Process — application logic is decoupled from model selection.
System Prompts	Per-Process behavioral constraints, scope, output format, safety rules, and tone. Versioned and governed via the Workflow and Approval Engine before production deployment.

Request Flow

Platform Application / SAB Autonomous → Request Router (resolve Function → Process) → Process Orchestrator (retrieve Model + System Prompt) → System Prompt Engine (construct governed context) → Model Dispatch Engine (dispatch to LLM Provider, apply rate limits) → LLM Provider → Response Handler (validate · format · log) → Calling Application

Security & Auditability

All gateway calls authenticated via platform JWT tokens. Provider API keys stored AES-256 encrypted — never exposed to calling applications. Every request and response logged with: calling user, calling process, model used, token count, latency, and a hash of the System Prompt version applied. Rate limiting and quota management enforced per tenant and per model. Audit logs are immutable and feed platform compliance reporting.

Section 5

Deployment & Tenancy

Four deployment models — on-premise single node, on-premise multi-node, pure cloud, and hybrid — with a shared infrastructure, isolated data multi-tenancy model across all configurations.

Chainsys Platform — Cloud Deployment Topologyv5 · 1400×800

Deployment Models

Model	Infrastructure	Availability	Use Case
On-Premise Single Node	4 VMs: Application, Metadata/Datamart, Indexing, NoSQL	~90%	Small/medium data volumes, low concurrency, pilot deployments
On-Premise Multi-Node	Independent VM clusters per tier (DMZ, Web, Foundation, App Platform, DB, Solr, CouchDB) with HAProxy LB	≥99.9%	Enterprise production, high concurrency, SLA-governed workloads
Pure Cloud (AWS/Azure/GCP)	Public: shared infra, isolated subnets per tenant. Private: fully dedicated per tenant. IPsec for on-premise connectivity.	≥99.9%	Cloud-first customers, managed service deployments
Hybrid	Cloud: Web Nodes, Foundation, Storage. On-premise: dataZap Agent, dataZense Agent, Smart BOT Agent at client data centres.	≥99.9% (cloud)	Data sovereignty requirements or on-premise source systems

Multi-Tenancy Architecture

The platform implements a shared infrastructure, isolated data tenancy model. Isolation is enforced at four layers:

Isolation Layer	Mechanism
Identity Isolation	Each tenant maps to a dedicated Keycloak Realm. Authentication, SSO configuration, MFA policy, and IdP federation are fully independent per realm. No cross-realm identity bleed.
Data Isolation	Dedicated database schemas or separate database instances per tenant (configurable). Metadata, Datamart, and CouchDB partitioned per tenant. Authorization Engine enforces tenant-scoped access at query execution time.
Application Isolation	License Authorization controls which applications and features are available per tenant. Node quotas enforced per subscription tier.
Network Isolation (Cloud)	Dedicated subnets per tenant within the virtual network. Independent site-to-site IPsec tunnels for on-premise connectivity.

Service Level Objectives

SLI	Multi-Node	Single Node
Platform Availability	99.9%	90%
API Response Time (p95)	<500ms	<1s
Disaster Recovery RTO	~1 hour	~2 hours
RPO	Configurable per application / database

Section 6

Scalability Architecture

The platform is designed for horizontal scale-out at every tier. Each layer — from DMZ through the application platform to the data tier — can be scaled independently by adding nodes to the relevant cluster, without changes to adjacent tiers.

Chainsys Platform — Horizontal Scalabilityv2 · 1100×660

Scale-Out Per Tier

Tier	Scale Mechanism	Notes
HTTPD / Reverse Proxy	Additional Apache HTTPD nodes behind DNS load balancing	Stateless — scales linearly with request volume
Tomcat / Web Tier	HAProxy distributes across Tomcat cluster; new nodes added to HAProxy backend pool	Session affinity configurable per deployment
Application Platform	Each service (dataZap, dataZen, dataZense, BOTS, SAB) scales independently. Node n+ added to per-service cluster.	Services are stateless — shared state in Redis and PostgreSQL
Database Tier	PostgreSQL Primary/Replica for read scale-out. Write scaling via vertical node sizing or partitioning.	CouchDB and Solr scale via additional cluster nodes
Cache / Messaging	Redis cluster mode for horizontal cache scaling. ActiveMQ broker clustering for message throughput.	Cache scale reduces DB pressure and improves API response times

Each service module (dataZap / dataZen / dataZense / BOTS / SAB) maintains its own node cluster and can be independently scaled. A heavy dataZap migration workload scales extraction nodes without affecting the dataZense Analytics tier.

Section 7

Technology Stack

Full reference technology stack across all platform tiers. Components marked ⚠ have lifecycle considerations addressed in Section 8.

Category	Technology	Version	Status
Language	Java	17.x LTS	Active
Backend Framework (Data Apps)	Spring Boot	3.1.6	Migrating → Quarkus
Backend Framework (AI Gateway)	Quarkus	Latest stable	Active
Web Framework	Spring MVC	6.0.13	Active
Data Access	Spring JDBC	6.0.13	Active
Security Framework	Spring Security	6.2.14	Active
Identity Provider	Keycloak	Latest stable	Active
Web Server	Apache Tomcat	10.x	Active
Load Balancer	HAProxy	Latest stable	Active
Reverse Proxy	Apache HTTPD	Latest stable	Active
Primary Database	PostgreSQL	Latest stable	Active
Secondary Database	Oracle Database	As per license	Active
NoSQL Store	Apache CouchDB	Latest stable	Active
Full-Text Index	Apache Solr	Latest stable	Active
Cache	Redis	Latest stable	Active
Messaging	Apache ActiveMQ	Latest stable	Active
Versioning	Git / SVN	Latest stable	Active
Analytics Runtime	R, Dimple.js	Latest stable	Active
App Builder Runtime ⚠	Node.js	12.16 (EOL)	Upgrade Planned
Mobile Framework	Ionic	v4	Active
Web UI (Data Apps)	Angular	18	Active
Web UI (AI Gateway)	ReactJS	22	Active
Test Automation	Selenium WebDriver, Sikuli	Latest stable	Active
Transport Security	TLS	1.3 minimum	Active
Data Encryption	AES-256-GCM	—	Active
Password Hashing	bcrypt / PBKDF2	Via Keycloak	Active

🔧

Companion Document

Platform Component Reference

Architecture rationale, ChainSys-specific configuration, and service applicability for all 19 platform components across infrastructure, runtime, data, AI, observability, and deployment layers.

→

Section 8

Platform Modernization

Strategic runtime modernization covering the Quarkus migration, Node.js runtime upgrade, distributed tracing, and vector store introduction for semantic AI features.

Quarkus Migration

The migration from Spring Boot to Quarkus follows a strangler-fig pattern — existing services remain fully operational during transition. The AI Gateway, already on Quarkus, validates the target stack in production.

Component	Current Stack	Target Stack	Status
AI Gateway	Quarkus + ReactJS	Quarkus + ReactJS	✅ Complete
dataZap	Spring Boot 3.1.6	Quarkus	🔄 In Pipeline
dataZen	Spring Boot 3.1.6	Quarkus	🔄 In Pipeline
dataZense	Spring Boot 3.1.6	Quarkus	🔄 In Pipeline
Smart BOTS	Spring Boot 3.1.6	Quarkus	🔄 Planned
Smart App Builder	Spring Boot 3.1.6 + Node.js 12	Quarkus + Node.js 22 LTS	🔄 Planned
Platform Foundation	Spring Boot 3.1.6	Quarkus	🔄 Planned

Quarkus benefits at scale: Native image compilation (GraalVM) reduces container image sizes by ~60–70% and startup from seconds to milliseconds. Reactive programming model (Mutiny) supports non-blocking I/O — beneficial for AI Gateway and high-throughput dataZap pipelines. Higher service density per node reduces infrastructure footprint.

Planned Enhancements (v6.1+)

Enhancement	Target	Description
Distributed Tracing	v6.1	OpenTelemetry-based trace correlation across all services. End-to-end request tracing from API Gateway through dataZap → dataZen → dataZense. Exporting to Jaeger or Zipkin. Current release uses structured correlation IDs at the service layer.
Milvus Vector Store	v6.1	Vector database for semantic search, embedding-based similarity, and RAG patterns in AI Gateway functions. Enables semantic data catalog search and document similarity in the unstructured profiler.
Node.js Runtime Upgrade	v6.1	Node.js 12.16 (EOL) upgraded to Node.js 22 LTS across Smart App Builder runtime and SAB Autonomous deployment environments.
AI Gateway — Streaming	v6.1	Token-streaming support for long-running LLM generation tasks. Enables real-time progressive output for SAB Autonomous agent reasoning steps.
SAB Autonomous — Extended Tools	v6.2	Expanded native tool set: ERP write-back, approval triggers, report generation, and external webhook calls.

Full 2026 Platform Roadmap

The modernisation items above are the architecturally-relevant subset of the full 2026 delivery roadmap. The complete roadmap covers Q2–Q4 2026 across all platform workstreams including commercial feature releases, AI capability expansions, and Data Product Hub delivery.

📅

Platform Roadmap

2026 Roadmap — Platform Strategy

Q2–Q4 2026 delivery roadmap — AI capabilities, Data Catalog & Data Product Hub, MDM intelligence, analytics, and platform modernization. 9 slides covering full delivery scope.

→