Technical architecture reference covering system topology, service architecture, AI Gateway, deployment models, scalability, technology stack, and platform modernization.
Version 6.0.1 · March 2026 · Architecture & Deployment Teams
Section 1
Platform Overview
The Chainsys Smart Data Platform is a modular enterprise data management suite covering data movement, quality, cataloguing, analytics, and application development. Eleven solution domains are composed across six service modules.
Service Modules
Service Module
Product
Core Responsibility
Platform Family
Data Migration & Integration
dataZap
ETL, CDC, batch and real-time data movement, 50+ endpoint connectors
Smart Data Platform
Master Data & Quality
dataZen
MDM, DQM, Data Governance, golden record management
Smart Data Platform
Data Catalog & Governance
dataZense
Data discovery, lineage, PII tagging, business glossary
Smart Data Platform
Data Analytics
dataZense
OLAP, ML/NLP, dashboards, R runtimes
Smart Data Platform
Process & Test Automation
Smart BOTS
Record-and-playback test and business process automation
Smart Business Platform
Application Builder
Smart App Builder
Low-code web/mobile apps, agentic AI workflows (SAB Autonomous)
Smart Business Platform
Solutioning Model — 11 Solutions across 6 Modules
Each solution domain is fulfilled through a composition of service modules. ● Primary — leads delivery. ◎ Supporting — contributes significant capability.
Distinct layers and independently deployable modules. Applications share Foundation services without duplicating them.
Scalability
Horizontal scaling via HAProxy at every tier. Stateless application services support cluster expansion without downtime.
Zero Trust Security
Authentication and authorization enforced at every layer. JWT tokens validated on every request. No implicit trust between internal services.
Flexibility
Spring Boot (transitioning to Quarkus) auto-configuration supports rapid adaptation and independent service deployment.
Maintainability
Clean API contracts, standardised versioning, and centralized governance via the AI Gateway ensure long-term platform maintainability.
Section 2
System Architecture
Full system topology from internet-facing endpoints through the DMZ, web and application tiers, to data stores and observability. Each layer enforces security boundaries separating the public internet from internal platform services.
REST + SOAP API management, versioning, monitoring, and self-service API catalog.
Observability target (v6.1): OpenTelemetry-based distributed tracing across all services — end-to-end request correlation from API Gateway through dataZap → dataZen → dataZense. Current release uses structured correlation IDs at the service layer with centralised log aggregation.
Section 3
Service Architecture
Each service follows a consistent horizontal layout: data sources and endpoint connectors on the left, processing engines in the centre, data stores on the right. The AI Gateway (amber, bottom-left) connects to LLM Providers (amber, bottom-right). The Infrastructure bar (gray, bottom) shows the runtime stack per service.
3.1 dataZap — Data Migration & Integration
dataZap is the platform's data movement engine, responsible for all extract, transform, and load operations. It connects to 50+ endpoint types via JDBC, REST, SOAP, OData, SAP JCo, FTP, and native connectors. Pipelines are composed visually and executed via a distributed execution controller supporting real-time (CDC), batch, and scheduled modes.
Target loading with pre/post hooks; full reconciliation and audit
Execution Controller
Migration Flow, Process Flow, Data Exchange, Scheduler, Versioning
Pipeline orchestration; dependency management; scheduling; pipeline version control
Endpoint Category
Examples
Relational Databases
PostgreSQL, Oracle, SQL Server, MySQL, DB2, SAP HANA, Sybase
Enterprise Applications
SAP ECC/S4HANA, Oracle EBS/JDE/PeopleSoft, Microsoft Dynamics, IBM Maximo
Cloud Applications
Oracle ERP Cloud, Salesforce, Workday, SAP SuccessFactors, MS Dynamics 365, Concur
Big Data
Hive, Snowflake, Amazon Redshift, HBase
NoSQL & Storage
MongoDB, Apache Solr, CouchDB, OneDrive, Box, FTP
Message Brokers
IBM MQ, Apache ActiveMQ
AI Gateway integration: field mapping recommendations, transformation logic generation, anomaly detection in data pipelines, and reconciliation insight narration.
3.2 dataZen — Master Data & Quality
dataZen provides MDM and DQM capabilities built on top of dataZap for endpoint connectivity. It maintains authoritative golden records across three hub stores — Request Hub, Quality Hub, and Master Data Hub — with a full governance and approval workflow layer.
dataZen — Master Data & Quality Servicev5 · 1100×620
Engine
Function
Integration Engine
dataZap handler (inbound/mapping/outbound), scheduling handler, API publisher handler. Routes inbound data through quality checks before writing to master stores.
Data Quality Engine
Rule/Profiling, Cleansing, Harmonization, Standardization engines. Applies configurable quality rules; surfaces failing records for remediation.
Data Governance Engine
Process Flow, Validation, and Approval engines. Enforces data stewardship workflows with full audit trail.
Master Hub Engine
Hub Design, Layout, Domain Template, Augmentation, and Reporting engines. Maintains the golden record across Request Hub, Quality Hub, and Master Data Hub.
AI Gateway integration: AI-assisted deduplication and entity resolution, enrichment suggestions, merge recommendations, and data quality rule generation from profile statistics.
3.3 dataZense — Data Catalog & Governance
dataZense Catalog provides enterprise data discovery, lineage, PII classification, business glossary management, and data governance workflows. Built on Apache Solr for full-text search across metadata assets, with structured and unstructured profiling engines feeding a centralised catalog store.
dataZense — Data Catalog & Governancev4 · 1100×620
Capability
Description
Structured Profiler
Metadata capture, column statistics, sampling, and relationship discovery across relational data stores.
Unstructured Profiler
OCR-based document scanning and form-based extraction for PDFs, images, and scanned documents.
Catalog Engine (Solr)
Full-text search, data registration, business glossary, PII tag engine, data lineage engine, data protection engine.
End-to-end lineage from source endpoint through transformation into target, stored and queryable via the catalog.
AI Gateway integration: metadata auto-tagging from profile statistics, PII classification enhancement, business glossary term suggestion, and lineage description narration.
3.4 dataZense — Data Analytics
dataZense Analytics provides an end-to-end analytical processing pipeline from raw data through OLAP and machine learning to visualised dashboards. The Learning Engine supports supervised, unsupervised, and reinforcement learning workloads with NLP capabilities, backed by R analytical runtimes.
dataZense — Data Analytics Servicev3 · 1100×620
Engine
Capability
Foundation Engine
Dataset management, data access layer, real-time data streaming for live dashboards.
Analytics Engine
OLAP Cube, Dimension management, Query Engine for ad-hoc analytical queries.
Learning Engine
Supervised learning (classification, regression), Unsupervised (clustering), Reinforcement Learning, NLP — backed by R and Python runtimes.
Visualization
View Engine, Dashboard Engine, Snapshot Engine, Formatting Engine, Report Scheduler — powered by Dimple.js and R ggplot2.
Query API
REST API for programmatic access to analytical results and embedding in external applications.
AI Gateway integration: insight narration, anomaly explanation, forecast commentary, and KPI narration for dashboard widgets.
3.5 Smart App Builder — Application Development & Agentic Workflows
Smart App Builder (SAB) is a low-code platform for building data-backed web and mobile applications via a visual design studio. SAB Autonomous extends the platform with multi-agent orchestration for complex enterprise tasks driven by natural language instructions.
Visual drag-and-drop object, layout, process, and integration design backed by dataZap connectors.
Web Build Engine
Angular + Platform Component Engine. Generates deployable Angular applications from visual model definitions.
Mobile Build Engine
Ionic v4 + Platform Component Engine. Cross-platform mobile applications from the same visual model definitions.
SAB Autonomous Runtime
Agentic workflow canvas (design-time) + Execution Engine (runtime). Agents assigned roles, tools, and goals; orchestrated via the AI Gateway. Supports multi-agent collaboration, memory management, human-in-the-loop approvals, and full audit trails.
Runtime
Node.js deployment server for web applications. CouchDB for app data storage. dataZap connectors for backend data access.
Node.js Runtime: Current runtime is Node.js 12.16 (end-of-life). Upgrade to Node.js 22 LTS is in the platform modernization pipeline — see Section 8.
Section 4
AI Gateway
The Chainsys AI Gateway is the centralised control plane for all LLM interactions across the platform. Every AI call — from dataZen quality rules, dataZap field mapping, dataZense catalog tagging, or SAB Autonomous agent orchestration — routes through the gateway for governance, dispatch, and audit.
Architectural Role
Built on Quarkus (backend) and ReactJS 22 (management console), the AI Gateway is the first Chainsys component on the next-generation microservices stack. It brokers between platform services and external or self-hosted LLM providers, enforcing governance at the process level through versioned System Prompts.
AI Capability Category
Description
Examples
Deterministic Agents
Rule-based automation within service engines. No LLM calls. Fast, predictable, auditable.
CDC change detection, validation rule evaluation, reconciliation checking, field type inference
Generative AI (via Gateway)
LLM-backed functions via named Process definitions. Governed by System Prompts. All calls logged.
Field mapping suggestions, deduplication scoring, metadata auto-tagging, insight narration, SAB Autonomous agent reasoning
Core Constructs
Construct
Purpose
LLM Providers
Registered AI model providers: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Google Vertex AI, Ollama (self-hosted).
Models
Individual LLM models under providers with model ID, context window, capability flags (text/embeddings/vision/function-calling), and rate limits.
Functions & Processes
AI capabilities exposed via named Function → Process abstractions. Model assignment is centrally managed per Process — application logic is decoupled from model selection.
System Prompts
Per-Process behavioral constraints, scope, output format, safety rules, and tone. Versioned and governed via the Workflow and Approval Engine before production deployment.
Request Flow
Platform Application / SAB Autonomous
→ Request Router (resolve Function → Process)
→ Process Orchestrator (retrieve Model + System Prompt)
→ System Prompt Engine (construct governed context)
→ Model Dispatch Engine (dispatch to LLM Provider, apply rate limits)
→ LLM Provider
→ Response Handler (validate · format · log)
→ Calling Application
Security & Auditability
All gateway calls authenticated via platform JWT tokens. Provider API keys stored AES-256 encrypted — never exposed to calling applications. Every request and response logged with: calling user, calling process, model used, token count, latency, and a hash of the System Prompt version applied. Rate limiting and quota management enforced per tenant and per model. Audit logs are immutable and feed platform compliance reporting.
Section 5
Deployment & Tenancy
Four deployment models — on-premise single node, on-premise multi-node, pure cloud, and hybrid — with a shared infrastructure, isolated data multi-tenancy model across all configurations.
Small/medium data volumes, low concurrency, pilot deployments
On-Premise Multi-Node
Independent VM clusters per tier (DMZ, Web, Foundation, App Platform, DB, Solr, CouchDB) with HAProxy LB
≥99.9%
Enterprise production, high concurrency, SLA-governed workloads
Pure Cloud (AWS/Azure/GCP)
Public: shared infra, isolated subnets per tenant. Private: fully dedicated per tenant. IPsec for on-premise connectivity.
≥99.9%
Cloud-first customers, managed service deployments
Hybrid
Cloud: Web Nodes, Foundation, Storage. On-premise: dataZap Agent, dataZense Agent, Smart BOT Agent at client data centres.
≥99.9% (cloud)
Data sovereignty requirements or on-premise source systems
Multi-Tenancy Architecture
The platform implements a shared infrastructure, isolated data tenancy model. Isolation is enforced at four layers:
Isolation Layer
Mechanism
Identity Isolation
Each tenant maps to a dedicated Keycloak Realm. Authentication, SSO configuration, MFA policy, and IdP federation are fully independent per realm. No cross-realm identity bleed.
Data Isolation
Dedicated database schemas or separate database instances per tenant (configurable). Metadata, Datamart, and CouchDB partitioned per tenant. Authorization Engine enforces tenant-scoped access at query execution time.
Application Isolation
License Authorization controls which applications and features are available per tenant. Node quotas enforced per subscription tier.
Network Isolation (Cloud)
Dedicated subnets per tenant within the virtual network. Independent site-to-site IPsec tunnels for on-premise connectivity.
Service Level Objectives
SLI
Multi-Node
Single Node
Platform Availability
99.9%
90%
API Response Time (p95)
<500ms
<1s
Disaster Recovery RTO
~1 hour
~2 hours
RPO
Configurable per application / database
Section 6
Scalability Architecture
The platform is designed for horizontal scale-out at every tier. Each layer — from DMZ through the application platform to the data tier — can be scaled independently by adding nodes to the relevant cluster, without changes to adjacent tiers.
Additional Apache HTTPD nodes behind DNS load balancing
Stateless — scales linearly with request volume
Tomcat / Web Tier
HAProxy distributes across Tomcat cluster; new nodes added to HAProxy backend pool
Session affinity configurable per deployment
Application Platform
Each service (dataZap, dataZen, dataZense, BOTS, SAB) scales independently. Node n+ added to per-service cluster.
Services are stateless — shared state in Redis and PostgreSQL
Database Tier
PostgreSQL Primary/Replica for read scale-out. Write scaling via vertical node sizing or partitioning.
CouchDB and Solr scale via additional cluster nodes
Cache / Messaging
Redis cluster mode for horizontal cache scaling. ActiveMQ broker clustering for message throughput.
Cache scale reduces DB pressure and improves API response times
Each service module (dataZap / dataZen / dataZense / BOTS / SAB) maintains its own node cluster and can be independently scaled. A heavy dataZap migration workload scales extraction nodes without affecting the dataZense Analytics tier.
Section 7
Technology Stack
Full reference technology stack across all platform tiers. Components marked ⚠ have lifecycle considerations addressed in Section 8.
Strategic runtime modernization covering the Quarkus migration, Node.js runtime upgrade, distributed tracing, and vector store introduction for semantic AI features.
Quarkus Migration
The migration from Spring Boot to Quarkus follows a strangler-fig pattern — existing services remain fully operational during transition. The AI Gateway, already on Quarkus, validates the target stack in production.
Component
Current Stack
Target Stack
Status
AI Gateway
Quarkus + ReactJS
Quarkus + ReactJS
✅ Complete
dataZap
Spring Boot 3.1.6
Quarkus
🔄 In Pipeline
dataZen
Spring Boot 3.1.6
Quarkus
🔄 In Pipeline
dataZense
Spring Boot 3.1.6
Quarkus
🔄 In Pipeline
Smart BOTS
Spring Boot 3.1.6
Quarkus
🔄 Planned
Smart App Builder
Spring Boot 3.1.6 + Node.js 12
Quarkus + Node.js 22 LTS
🔄 Planned
Platform Foundation
Spring Boot 3.1.6
Quarkus
🔄 Planned
Quarkus benefits at scale: Native image compilation (GraalVM) reduces container image sizes by ~60–70% and startup from seconds to milliseconds. Reactive programming model (Mutiny) supports non-blocking I/O — beneficial for AI Gateway and high-throughput dataZap pipelines. Higher service density per node reduces infrastructure footprint.
Planned Enhancements (v6.1+)
Enhancement
Target
Description
Distributed Tracing
v6.1
OpenTelemetry-based trace correlation across all services. End-to-end request tracing from API Gateway through dataZap → dataZen → dataZense. Exporting to Jaeger or Zipkin. Current release uses structured correlation IDs at the service layer.
Milvus Vector Store
v6.1
Vector database for semantic search, embedding-based similarity, and RAG patterns in AI Gateway functions. Enables semantic data catalog search and document similarity in the unstructured profiler.
Node.js Runtime Upgrade
v6.1
Node.js 12.16 (EOL) upgraded to Node.js 22 LTS across Smart App Builder runtime and SAB Autonomous deployment environments.
AI Gateway — Streaming
v6.1
Token-streaming support for long-running LLM generation tasks. Enables real-time progressive output for SAB Autonomous agent reasoning steps.
The modernisation items above are the architecturally-relevant subset of the full 2026 delivery roadmap. The complete roadmap covers Q2–Q4 2026 across all platform workstreams including commercial feature releases, AI capability expansions, and Data Product Hub delivery.