Move architecture/, docs/ content into wiki/ for a single source of truth: - architecture/deployment-blueprint.md → wiki/architecture/ - architecture/stack-architecture-review.md → wiki/architecture/ - architecture/wiki-platform-overview.md → wiki/architecture/ - docs/ARCHITECTURE.md → wiki/architecture/node-architecture.md - docs/API_REFERENCE.md → wiki/concepts/generalfunctions-api.md - docs/ISSUES.md → wiki/findings/open-issues-2026-03.md Remove stale files: - FUNCTIONAL_ISSUES_BACKLOG.md (was just a redirect pointer) - temp/ (stale cloud env examples) Fix README.md gitea URL (centraal.wbd-rd.nl → wbd-rd.nl). Update wiki index with all consolidated pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
19 KiB
title, created, updated, status, tags
| title | created | updated | status | tags | |||
|---|---|---|---|---|---|---|---|
| EVOLV Architecture Review | 2026-03-01 | 2026-04-07 | evolving |
|
EVOLV Architecture Review
Purpose
This document captures:
- the architecture implemented in this repository today
- the broader edge/site/central architecture shown in the drawings under
temp/ - the key strengths and weaknesses of that direction
- the currently preferred target stack based on owner decisions from this review
It is the local staging document for a later wiki update.
Evidence Used
Implemented stack evidence:
docker-compose.ymldocker/settings.jsdocker/grafana/provisioning/datasources/influxdb.yamlpackage.jsonnodes/*
Target-state evidence:
temp/fullStack.pdftemp/edge.pdftemp/CoreSync.drawio.pdftemp/cloud.yml
Owner decisions from this review:
- local InfluxDB is required for operational resilience
- central acts as the advisory/intelligence and API-entry layer, not as a direct field caller
- intended configuration authority is the database-backed
tagcoderingmodel - architecture wiki pages should be visual, not text-only
1. What Exists Today
1.1 Product/runtime layer
The codebase is currently a modular Node-RED package for wastewater/process automation:
- EVOLV ships custom Node-RED nodes for plant assets and process logic
- nodes emit both process/control messages and telemetry-oriented outputs
- shared helper logic lives in
nodes/generalFunctions/ - Grafana-facing integration exists through
dashboardAPIand Influx-oriented outputs
1.2 Implemented development stack
The concrete development stack in this repository is:
- Node-RED
- InfluxDB 2.x
- Grafana
That gives a clear local flow:
- EVOLV logic runs in Node-RED.
- Telemetry is emitted in a time-series-oriented shape.
- InfluxDB stores the telemetry.
- Grafana renders operational dashboards.
1.3 Existing runtime pattern in the nodes
A recurring EVOLV pattern is:
- output 0: process/control message
- output 1: Influx/telemetry message
- output 2: registration/control plumbing where relevant
So even in its current implemented form, EVOLV is not only a Node-RED project. It is already a control-plus-observability platform, with Node-RED as orchestration/runtime and InfluxDB/Grafana as telemetry and visualization services.
2. What The Drawings Describe
Across temp/fullStack.pdf and temp/CoreSync.drawio.pdf, the intended platform is broader and layered.
2.1 Edge / OT layer
The drawings consistently place these capabilities at the edge:
- PLC / OPC UA connectivity
- Node-RED container as protocol translator and logic runtime
- local broker in some variants
- local InfluxDB / Prometheus style storage in some variants
- local Grafana/SCADA in some variants
This is the plant-side operational layer.
2.2 Site / local server layer
The CoreSync drawings also show a site aggregation layer:
- RWZI-local server
- Node-RED / CoreSync services
- site-local broker
- site-local database
- upward API-based synchronization
This layer decouples field assets from central services and absorbs plant-specific complexity.
2.3 Central / cloud layer
The broader stack drawings and temp/cloud.yml show a central platform layer with:
- Gitea
- Jenkins
- reverse proxy / ingress
- Grafana
- InfluxDB
- Node-RED
- RabbitMQ / messaging
- VPN / tunnel concepts
- Keycloak in the drawing
- Portainer in the drawing
This is a platform-services layer, not just an application runtime.
3. Architecture Decisions From This Review
These decisions now shape the preferred EVOLV target architecture.
3.1 Local telemetry is mandatory for resilience
Local InfluxDB is not optional. It is required so that:
- operations continue when central SCADA or central services are down
- local dashboards and advanced digital-twin workflows can still consume recent and relevant process history
- local edge/site layers can make smarter decisions without depending on round-trips to central
3.2 Multi-level InfluxDB is part of the architecture
InfluxDB should exist on multiple levels where it adds operational value:
- edge/local for resilience and near-real-time replay
- site for plant-level history, diagnostics, and resilience
- central for fleet-wide analytics, benchmarking, and advisory intelligence
This is not just copy-paste storage at each level. The design intent is event-driven and selective.
3.3 Storage should be smart, not only deadband-driven
The target is not simple "store every point" or only a fixed deadband rule such as 1%.
The desired storage approach is:
- observe signal slope and change behavior
- preserve points where state is changing materially
- store fewer points where the signal can be reconstructed downstream with sufficient fidelity
- carry enough metadata or conventions so reconstruction quality is auditable
This implies EVOLV should evolve toward smart storage and signal-aware retention rather than naive event dumping.
3.4 Central is the intelligence and API-entry layer
Central may advise and coordinate edge/site layers, but external API requests should not hit field-edge systems directly.
The intended pattern is:
- external and enterprise integrations terminate centrally
- central evaluates, aggregates, authorizes, and advises
- site/edge layers receive mediated requests, policies, or setpoints
- field-edge remains protected behind an intermediate layer
This aligns with the stated security direction.
3.5 Configuration source of truth should be database-backed
The intended configuration authority is the database-backed tagcodering model, which already exists but is not yet complete enough to serve as the fully realized source of truth.
That means the architecture should assume:
- asset and machine metadata belong in
tagcodering - Node-RED flows should consume configuration rather than silently becoming the only configuration store
- more work is still needed before this behaves as the intended central configuration backbone
4. Visual Model
4.1 Platform topology
flowchart LR
subgraph OT["OT / Field"]
PLC["PLC / IO"]
DEV["Sensors / Machines"]
end
subgraph EDGE["Edge Layer"]
ENR["Edge Node-RED"]
EDB["Local InfluxDB"]
EUI["Local Grafana / Local Monitoring"]
EBR["Optional Local Broker"]
end
subgraph SITE["Site Layer"]
SNR["Site Node-RED / CoreSync"]
SDB["Site InfluxDB"]
SUI["Site Grafana / SCADA Support"]
SBR["Site Broker"]
end
subgraph CENTRAL["Central Layer"]
API["API / Integration Gateway"]
INTEL["Overview Intelligence / Advisory Logic"]
CDB["Central InfluxDB"]
CGR["Central Grafana"]
CFG["Tagcodering Config Model"]
GIT["Gitea"]
CI["CI/CD"]
IAM["IAM / Keycloak"]
end
DEV --> PLC
PLC --> ENR
ENR --> EDB
ENR --> EUI
ENR --> EBR
ENR <--> SNR
EDB <--> SDB
SNR --> SDB
SNR --> SUI
SNR --> SBR
SNR <--> API
API --> INTEL
API <--> CFG
SDB <--> CDB
INTEL --> SNR
CGR --> CDB
CI --> GIT
IAM --> API
IAM --> CGR
4.2 Command and access boundary
flowchart TD
EXT["External APIs / Enterprise Requests"] --> API["Central API Gateway"]
API --> AUTH["AuthN/AuthZ / Policy Checks"]
AUTH --> INTEL["Central Advisory / Decision Support"]
INTEL --> SITE["Site Integration Layer"]
SITE --> EDGE["Edge Runtime"]
EDGE --> PLC["PLC / Field Assets"]
EXT -. no direct access .-> EDGE
EXT -. no direct access .-> PLC
4.3 Smart telemetry flow
flowchart LR
RAW["Raw Signal"] --> EDGELOGIC["Edge Signal Evaluation"]
EDGELOGIC --> KEEP["Keep Critical Change Points"]
EDGELOGIC --> SKIP["Skip Reconstructable Flat Points"]
EDGELOGIC --> LOCAL["Local InfluxDB"]
LOCAL --> SITE["Site InfluxDB"]
SITE --> CENTRAL["Central InfluxDB"]
KEEP --> LOCAL
SKIP -. reconstruction assumptions / metadata .-> SITE
CENTRAL --> DASH["Fleet Dashboards / Analytics"]
5. Upsides Of This Direction
5.1 Strong separation between control and observability
Node-RED for runtime/orchestration and InfluxDB/Grafana for telemetry is still the right structural split:
- control stays close to the process
- telemetry storage/querying stays in time-series-native tooling
- dashboards do not need to overload Node-RED itself
5.2 Edge-first matches operational reality
For wastewater/process systems, edge-first remains correct:
- lower latency
- better degraded-mode behavior
- less dependence on WAN or central platform uptime
- clearer OT trust boundary
5.3 Site mediation improves safety and security
Using central as the enterprise/API entry point and site as the mediator improves posture:
- field systems are less exposed
- policy decisions can be centralized
- external integrations do not probe the edge directly
- site can continue operating even when upstream is degraded
5.4 Multi-level storage enables better analytics
Multiple Influx layers can support:
- local resilience
- site diagnostics
- fleet benchmarking
- smarter retention and reconstruction strategies
That is substantially more capable than a single central historian model.
5.5 tagcodering is the right long-term direction
A database-backed configuration authority is stronger than embedding configuration only in flows because it supports:
- machine metadata management
- controlled rollout of configuration changes
- clearer versioning and provenance
- future API-driven configuration services
6. Downsides And Risks
6.1 Smart storage raises algorithmic and governance complexity
Signal-aware storage and reconstruction is promising, but it creates architectural obligations:
- reconstruction rules must be explicit
- acceptable reconstruction error must be defined per signal type
- operators must know whether they see raw or reconstructed history
- compliance-relevant data may need stricter retention than operational convenience data
Without those rules, smart storage can become opaque and hard to trust.
6.2 Multi-level databases can create ownership confusion
If edge, site, and central all store telemetry, you must define:
- which layer is authoritative for which time horizon
- when backfill is allowed
- when data is summarized vs copied
- how duplicates or gaps are detected
Otherwise operations will argue over which trend is "the real one."
6.3 Central intelligence must remain advisory-first
Central guidance can become valuable, but direct closed-loop dependency on central would be risky.
The architecture should therefore preserve:
- local control authority at edge/site
- bounded and explicit central advice
- safe behavior if central recommendations stop arriving
6.4 tagcodering is not yet complete enough to lean on blindly
It is the right target, but its current partial state means there is still architecture debt:
- incomplete config workflows
- likely mismatch between desired and implemented schema behavior
- temporary duplication between flows, node config, and database-held metadata
This should be treated as a core platform workstream, not a side issue.
6.5 Broker responsibilities are still not crisp enough
The materials still reference MQTT/AMQP/RabbitMQ/brokers without one stable responsibility split. That needs to be resolved before large-scale deployment.
Questions still open:
- command bus or event bus?
- site-only or cross-site?
- telemetry transport or only synchronization/eventing?
- durability expectations and replay behavior?
7. Security And Regulatory Positioning
7.1 Purdue-style layering is a good fit
EVOLV's preferred structure aligns well with a Purdue-style OT/IT layering approach:
- PLCs and field assets stay at the operational edge
- edge runtimes stay close to the process
- site systems mediate between OT and broader enterprise concerns
- central services host APIs, identity, analytics, and engineering workflows
That is important because it supports segmented trust boundaries instead of direct enterprise-to-field reach-through.
7.2 NIS2 alignment
Directive (EU) 2022/2555 (NIS2) requires cybersecurity risk-management measures, incident handling, and stronger governance for covered entities.
This architecture supports that by:
- limiting direct exposure of field systems
- separating operational layers
- enabling central policy and oversight
- preserving local operation during upstream failure
7.3 CER alignment
Directive (EU) 2022/2557 (Critical Entities Resilience Directive) focuses on resilience of essential services.
The edge-plus-site approach supports that direction because:
- local/site layers can continue during central disruption
- essential service continuity does not depend on one central runtime
- degraded-mode behavior can be explicitly designed per layer
7.4 Cyber Resilience Act alignment
Regulation (EU) 2024/2847 (Cyber Resilience Act) creates cybersecurity requirements for products with digital elements.
For EVOLV, that means the platform should keep strengthening:
- secure configuration handling
- vulnerability and update management
- release traceability
- lifecycle ownership of components and dependencies
7.5 GDPR alignment where personal data is present
Regulation (EU) 2016/679 (GDPR) applies whenever EVOLV processes personal data.
The architecture helps by:
- centralizing ingress
- reducing unnecessary propagation of data to field layers
- making access, retention, and audit boundaries easier to define
7.6 What can and cannot be claimed
The defensible claim is that EVOLV can be deployed in a way that supports compliance with strict European cybersecurity and resilience expectations.
The non-defensible claim is that EVOLV is automatically compliant purely because of the architecture diagram.
Actual compliance still depends on implementation and operations, including:
- access control
- patch and vulnerability management
- incident response
- logging and audit evidence
- retention policy
- data classification
8. Recommended Ideal Stack
The ideal EVOLV stack should be layered around operational boundaries, not around tools.
7.1 Layer A: Edge execution
Purpose:
- connect to PLCs and field assets
- execute time-sensitive local logic
- preserve operation during WAN/central loss
- provide local telemetry access for resilience and digital-twin use cases
Recommended components:
- Node-RED runtime for EVOLV edge flows
- OPC UA and protocol adapters
- local InfluxDB
- optional local Grafana for local engineering/monitoring
- optional local broker only when multiple participants need decoupling
Principle:
- edge remains safe and useful when disconnected
7.2 Layer B: Site integration
Purpose:
- aggregate multiple edge systems at plant/site level
- host plant-local dashboards and diagnostics
- mediate between raw OT detail and central standardization
- serve as the protected step between field systems and central requests
Recommended components:
- site Node-RED / CoreSync services
- site InfluxDB
- site Grafana / SCADA-supporting dashboards
- site broker where asynchronous eventing is justified
Principle:
- site absorbs plant complexity and protects field assets
7.3 Layer C: Central platform
Purpose:
- fleet-wide analytics
- shared dashboards
- engineering lifecycle
- enterprise/API entry point
- overview intelligence and advisory logic
Recommended components:
- Gitea
- CI/CD
- central InfluxDB
- central Grafana
- API/integration gateway
- IAM
- VPN/private connectivity
tagcodering-backed configuration services
Principle:
- central coordinates, advises, and governs; it is not the direct field caller
7.4 Cross-cutting platform services
These should be explicit architecture elements:
- secrets management
- certificate management
- backup/restore
- audit logging
- monitoring/alerting of the platform itself
- versioned configuration and schema management
- rollout/rollback strategy
9. Recommended Opinionated Choices
8.1 Keep Node-RED as the orchestration layer, not the whole platform
Node-RED should own:
- process orchestration
- protocol mediation
- edge/site logic
- KPI production
It should not become the sole owner of:
- identity
- long-term configuration authority
- secret management
- compliance/audit authority
8.2 Use InfluxDB by function and horizon
Recommended split:
- edge: resilience, local replay, digital-twin input
- site: plant diagnostics and local continuity
- central: fleet analytics, advisory intelligence, benchmarking, and long-term cross-site views
8.3 Prefer smart telemetry retention over naive point dumping
Recommended rule:
- keep information-rich points
- reduce information-poor flat spans
- document reconstruction assumptions
- define signal-class-specific fidelity expectations
This needs design discipline, but it is a real differentiator if executed well.
8.4 Put enterprise/API ingress at central, not at edge
This should become a hard architectural rule:
- external requests land centrally
- central authenticates and authorizes
- central or site mediates downward
- edge never becomes the exposed public integration surface
8.5 Make tagcodering the target configuration backbone
The architecture should be designed so that tagcodering can mature into:
- machine and asset registry
- configuration source of truth
- site/central configuration exchange point
- API-served configuration source for runtime layers
10. Suggested Phasing
Phase 1: Stabilize contracts
- define topic and payload contracts
- define telemetry classes and reconstruction policy
- define asset, machine, and site identity model
- define
tagcoderingscope and schema ownership
Phase 2: Harden local/site resilience
- formalize edge and site runtime patterns
- define local telemetry retention and replay behavior
- define central-loss behavior
- define dashboard behavior during isolation
Phase 3: Harden central platform
- IAM
- API gateway
- central observability
- CI/CD
- backup and disaster recovery
- config services over
tagcodering
Phase 4: Introduce selective synchronization and intelligence
- event-driven telemetry propagation rules
- smart-storage promotion/backfill policies
- advisory services from central
- auditability of downward recommendations and configuration changes
11. Immediate Open Questions Before Wiki Finalization
- Which signals are allowed to use reconstruction-aware smart storage, and which must remain raw or near-raw for audit/compliance reasons?
- How should
tagcoderingbe exposed to runtime layers: direct database access, a dedicated API, or both? - What exact responsibility split should EVOLV use between API synchronization and broker-based eventing?
12. Recommended Wiki Structure
The wiki should not be one long page. It should be split into:
- platform overview with the main topology diagram
- edge-site-central runtime model
- telemetry and smart storage model
- security and access-boundary model
- configuration architecture centered on
tagcodering
13. Next Step
Use this document as the architecture baseline. The companion markdown page in architecture/ can then be shaped into a wiki-ready visual overview page with Mermaid diagrams and shorter human-readable sections.