Compare commits

3 Commits

Author SHA1 Message Date
root
91a298960c Prepare reactor, diffuser, and settler updates for mainline merge 2026-03-31 14:26:33 +02:00
znetsixe
1c4a3f9685 Add deployment blueprint 2026-03-23 11:54:24 +01:00
znetsixe
9ca32dddfb Extend architecture review with security positioning 2026-03-23 11:35:40 +01:00
9 changed files with 1402 additions and 103 deletions

View File

@@ -0,0 +1,43 @@
## Context
The single demo bioreactor did not reflect the intended EVOLV biological treatment concept. The owner requested:
- four reactor zones in series
- staged aeration based on effluent NH4
- local visualization per zone for NH4, NO3, O2, and other relevant state variables
- improved PFR numerical stability by increasing reactor resolution
The localhost deployment also needed to remain usable for E2E debugging with Node-RED, InfluxDB, and Grafana.
## Options Considered
1. Keep one large PFR and add more internal profile visualization only.
2. Split the biology into four explicit reactor zones in the flow and control aeration at zone level.
3. Replace the PFR demo with a simpler CSTR train for faster visual response.
## Decision
Choose option 2.
The demo flow now uses four explicit PFR zones in series with:
- equal-zone sizing (`4 x 500 m3`, total `2000 m3`)
- explicit `Fluent` forwarding between zones
- common clocking for all zones
- external `OTR` control instead of fixed `kla`
- staged NH4-based aeration escalation with 30-minute hold logic
- per-zone telemetry to InfluxDB and Node-RED dashboard charts
For runtime stability on localhost, the demo uses a higher spatial resolution with moderate compute load rather than the earlier single-reactor setup.
## Consequences
- The flow is easier to reason about operationally because each aeration zone is explicit.
- Zone-level telemetry is available for dashboarding and debugging.
- PFR outlet response remains residence-time dependent, so zone outlet composition will not change instantly after startup or inflow changes.
- Grafana datasource query round-trip remains valid, but dashboard auto-generation still needs separate follow-up if strict dashboard creation is required in E2E checks.
## Rollback / Migration Notes
- Rolling back to the earlier demo means restoring the single `demo_reactor` topology in `docker/demo-flow.json`.
- Existing E2E checks and dashboards should prefer the explicit zone measurements (`reactor_demo_reactor_z1` ... `reactor_demo_reactor_z4`) going forward.

View File

@@ -0,0 +1,270 @@
# EVOLV Deployment Blueprint
## Purpose
This document turns the current EVOLV architecture into a concrete deployment model.
It focuses on:
- target infrastructure layout
- container/service topology
- environment and secret boundaries
- rollout order from edge to site to central
It is the local source document behind the wiki deployment pages.
## 1. Deployment Principles
- edge-first operation: plant logic must continue when central is unavailable
- site mediation: site services protect field systems and absorb plant-specific complexity
- central governance: external APIs, analytics, IAM, CI/CD, and shared dashboards terminate centrally
- layered telemetry: InfluxDB exists where operationally justified at edge, site, and central
- configuration authority: `tagcodering` should become the source of truth for configuration
- secrets hygiene: tracked manifests contain variables only; secrets live in server-side env or secret stores
## 2. Layered Deployment Model
### 2.1 Edge node
Purpose:
- interface with PLCs and field assets
- execute local Node-RED logic
- retain local telemetry for resilience and digital-twin use cases
Recommended services:
- `evolv-edge-nodered`
- `evolv-edge-influxdb`
- optional `evolv-edge-grafana`
- optional `evolv-edge-broker`
Should not host:
- public API ingress
- central IAM
- source control or CI/CD
### 2.2 Site node
Purpose:
- aggregate one or more edge nodes
- host plant-local dashboards and engineering visibility
- mediate traffic between edge and central
Recommended services:
- `evolv-site-nodered` or `coresync-site`
- `evolv-site-influxdb`
- `evolv-site-grafana`
- optional `evolv-site-broker`
### 2.3 Central platform
Purpose:
- fleet-wide analytics
- API and integration ingress
- engineering lifecycle and releases
- identity and governance
Recommended services:
- reverse proxy / ingress
- API gateway
- IAM
- central InfluxDB
- central Grafana
- Gitea
- CI/CD runner/controller
- optional broker for asynchronous site/central workflows
- configuration services over `tagcodering`
## 3. Target Container Topology
### 3.1 Edge host
Minimum viable edge stack:
```text
edge-host-01
- Node-RED
- InfluxDB
- optional Grafana
```
Preferred production edge stack:
```text
edge-host-01
- Node-RED
- InfluxDB
- local health/export service
- optional local broker
- optional local dashboard service
```
### 3.2 Site host
Minimum viable site stack:
```text
site-host-01
- Site Node-RED / CoreSync
- Site InfluxDB
- Site Grafana
```
Preferred production site stack:
```text
site-host-01
- Site Node-RED / CoreSync
- Site InfluxDB
- Site Grafana
- API relay / sync service
- optional site broker
```
### 3.3 Central host group
Central should not be one giant undifferentiated host forever. It should trend toward at least these responsibility groups:
```text
central-ingress
- reverse proxy
- API gateway
- IAM
central-observability
- central InfluxDB
- Grafana
central-engineering
- Gitea
- CI/CD
- deployment orchestration
central-config
- tagcodering-backed config services
```
For early rollout these may be colocated, but the responsibility split should remain clear.
## 4. Compose Strategy
The current repository shows:
- `docker-compose.yml` as a development stack
- `temp/cloud.yml` as a broad central-stack example
For production, EVOLV should not rely on one flat compose file for every layer.
Recommended split:
- `compose.edge.yml`
- `compose.site.yml`
- `compose.central.yml`
- optional overlay files for site-specific differences
Benefits:
- clearer ownership per layer
- smaller blast radius during updates
- easier secret and env separation
- easier rollout per site
## 5. Environment And Secrets Strategy
### 5.1 Current baseline
`temp/cloud.yml` now uses environment variables instead of inline credentials. That is the minimum acceptable baseline.
### 5.2 Recommended production rule
- tracked compose files contain `${VARIABLE}` placeholders only
- real secrets live in server-local `.env` files or a managed secret store
- no shared default production passwords in git
- separate env files per layer and per environment
Suggested structure:
```text
/opt/evolv/
compose.edge.yml
compose.site.yml
compose.central.yml
env/
edge.env
site.env
central.env
```
## 6. Recommended Network Flow
### 6.1 Northbound
- edge publishes or syncs upward to site
- site aggregates and forwards selected data to central
- central exposes APIs and dashboards to approved consumers
### 6.2 Southbound
- central issues advice, approved config, or mediated requests
- site validates and relays to edge where appropriate
- edge remains the execution point near PLCs
### 6.3 Forbidden direct path
- enterprise or internet clients should not directly query PLC-connected edge runtimes
## 7. Rollout Order
### Phase 1: Edge baseline
- deploy edge Node-RED
- deploy local InfluxDB
- validate PLC connectivity
- validate local telemetry and resilience
### Phase 2: Site mediation
- deploy site Node-RED / CoreSync
- connect one or more edge nodes
- validate site-local dashboards and outage behavior
### Phase 3: Central services
- deploy ingress, IAM, API, Grafana, central InfluxDB
- deploy Gitea and CI/CD services
- validate controlled northbound access
### Phase 4: Configuration backbone
- connect runtime layers to `tagcodering`
- reduce config duplication in flows
- formalize config promotion and rollback
### Phase 5: Smart telemetry policy
- classify signals
- define reconstruction rules
- define authoritative layer per horizon
- validate analytics and auditability
## 8. Immediate Technical Recommendations
- treat `docker/settings.js` as development-only and create hardened production settings separately
- split deployment manifests by layer
- define env files per layer and environment
- formalize healthchecks and backup procedures for every persistent service
- define whether broker usage is required at edge, site, central, or only selectively
## 9. Next Technical Work Items
1. create draft `compose.edge.yml`, `compose.site.yml`, and `compose.central.yml`
2. define server directory layout and env-file conventions
3. define production Node-RED settings profile
4. define site-to-central sync path
5. define deployment and rollback runbook

View File

@@ -364,7 +364,77 @@ Questions still open:
- telemetry transport or only synchronization/eventing?
- durability expectations and replay behavior?
## 7. Recommended Ideal Stack
## 7. Security And Regulatory Positioning
### 7.1 Purdue-style layering is a good fit
EVOLV's preferred structure aligns well with a Purdue-style OT/IT layering approach:
- PLCs and field assets stay at the operational edge
- edge runtimes stay close to the process
- site systems mediate between OT and broader enterprise concerns
- central services host APIs, identity, analytics, and engineering workflows
That is important because it supports segmented trust boundaries instead of direct enterprise-to-field reach-through.
### 7.2 NIS2 alignment
Directive (EU) 2022/2555 (NIS2) requires cybersecurity risk-management measures, incident handling, and stronger governance for covered entities.
This architecture supports that by:
- limiting direct exposure of field systems
- separating operational layers
- enabling central policy and oversight
- preserving local operation during upstream failure
### 7.3 CER alignment
Directive (EU) 2022/2557 (Critical Entities Resilience Directive) focuses on resilience of essential services.
The edge-plus-site approach supports that direction because:
- local/site layers can continue during central disruption
- essential service continuity does not depend on one central runtime
- degraded-mode behavior can be explicitly designed per layer
### 7.4 Cyber Resilience Act alignment
Regulation (EU) 2024/2847 (Cyber Resilience Act) creates cybersecurity requirements for products with digital elements.
For EVOLV, that means the platform should keep strengthening:
- secure configuration handling
- vulnerability and update management
- release traceability
- lifecycle ownership of components and dependencies
### 7.5 GDPR alignment where personal data is present
Regulation (EU) 2016/679 (GDPR) applies whenever EVOLV processes personal data.
The architecture helps by:
- centralizing ingress
- reducing unnecessary propagation of data to field layers
- making access, retention, and audit boundaries easier to define
### 7.6 What can and cannot be claimed
The defensible claim is that EVOLV can be deployed in a way that supports compliance with strict European cybersecurity and resilience expectations.
The non-defensible claim is that EVOLV is automatically compliant purely because of the architecture diagram.
Actual compliance still depends on implementation and operations, including:
- access control
- patch and vulnerability management
- incident response
- logging and audit evidence
- retention policy
- data classification
## 8. Recommended Ideal Stack
The ideal EVOLV stack should be layered around operational boundaries, not around tools.
@@ -446,7 +516,7 @@ These should be explicit architecture elements:
- versioned configuration and schema management
- rollout/rollback strategy
## 8. Recommended Opinionated Choices
## 9. Recommended Opinionated Choices
### 8.1 Keep Node-RED as the orchestration layer, not the whole platform
@@ -501,7 +571,7 @@ The architecture should be designed so that `tagcodering` can mature into:
- site/central configuration exchange point
- API-served configuration source for runtime layers
## 9. Suggested Phasing
## 10. Suggested Phasing
### Phase 1: Stabilize contracts
@@ -533,13 +603,13 @@ The architecture should be designed so that `tagcodering` can mature into:
- advisory services from central
- auditability of downward recommendations and configuration changes
## 10. Immediate Open Questions Before Wiki Finalization
## 11. Immediate Open Questions Before Wiki Finalization
1. Which signals are allowed to use reconstruction-aware smart storage, and which must remain raw or near-raw for audit/compliance reasons?
2. How should `tagcodering` be exposed to runtime layers: direct database access, a dedicated API, or both?
3. What exact responsibility split should EVOLV use between API synchronization and broker-based eventing?
## 11. Recommended Wiki Structure
## 12. Recommended Wiki Structure
The wiki should not be one long page. It should be split into:
@@ -549,6 +619,6 @@ The wiki should not be one long page. It should be split into:
4. security and access-boundary model
5. configuration architecture centered on `tagcodering`
## 12. Next Step
## 13. Next Step
Use this document as the architecture baseline. The companion markdown page in `architecture/` can then be shaped into a wiki-ready visual overview page with Mermaid diagrams and shorter human-readable sections.

File diff suppressed because it is too large Load Diff

View File

@@ -10,10 +10,11 @@
"wastewater"
],
"node-red": {
"nodes": {
"dashboardapi": "nodes/dashboardAPI/dashboardapi.js",
"machineGroupControl": "nodes/machineGroupControl/mgc.js",
"measurement": "nodes/measurement/measurement.js",
"nodes": {
"dashboardapi": "nodes/dashboardAPI/dashboardapi.js",
"diffuser": "nodes/diffuser/diffuser.js",
"machineGroupControl": "nodes/machineGroupControl/mgc.js",
"measurement": "nodes/measurement/measurement.js",
"monster": "nodes/monster/monster.js",
"reactor": "nodes/reactor/reactor.js",
"rotatingMachine": "nodes/rotatingMachine/rotatingMachine.js",
@@ -30,11 +31,12 @@
"docker:logs": "docker compose logs -f nodered",
"docker:shell": "docker compose exec nodered sh",
"docker:test": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh",
"docker:test:basic": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh basic",
"docker:test:integration": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh integration",
"docker:test:edge": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh edge",
"docker:test:gf": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh gf",
"docker:validate": "docker compose exec nodered sh /data/evolv/scripts/validate-nodes.sh",
"docker:test:basic": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh basic",
"docker:test:integration": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh integration",
"docker:test:edge": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh edge",
"docker:test:gf": "docker compose exec nodered sh /data/evolv/scripts/test-all.sh gf",
"test:e2e:reactor": "node scripts/e2e-reactor-roundtrip.js",
"docker:validate": "docker compose exec nodered sh /data/evolv/scripts/validate-nodes.sh",
"docker:deploy": "docker compose exec nodered sh /data/evolv/scripts/deploy-flow.sh",
"docker:reset": "docker compose down -v && docker compose up -d --build"
},

View File

@@ -0,0 +1,269 @@
#!/usr/bin/env node
/**
* E2E reactor round-trip test:
* Node-RED -> InfluxDB -> Grafana proxy query
*/
const fs = require('node:fs');
const path = require('node:path');
const NR_URL = process.env.NR_URL || 'http://localhost:1880';
const INFLUX_URL = process.env.INFLUX_URL || 'http://localhost:8086';
const GRAFANA_URL = process.env.GRAFANA_URL || 'http://localhost:3000';
const GRAFANA_USER = process.env.GRAFANA_USER || 'admin';
const GRAFANA_PASSWORD = process.env.GRAFANA_PASSWORD || 'evolv';
const INFLUX_ORG = process.env.INFLUX_ORG || 'evolv';
const INFLUX_BUCKET = process.env.INFLUX_BUCKET || 'telemetry';
const INFLUX_TOKEN = process.env.INFLUX_TOKEN || 'evolv-dev-token';
const GRAFANA_DS_UID = process.env.GRAFANA_DS_UID || 'cdzg44tv250jkd';
const FLOW_FILE = path.join(__dirname, '..', 'docker', 'demo-flow.json');
const REQUIRE_GRAFANA_DASHBOARDS = process.env.REQUIRE_GRAFANA_DASHBOARDS === '1';
const REACTOR_MEASUREMENTS = [
'reactor_demo_reactor_z1',
'reactor_demo_reactor_z2',
'reactor_demo_reactor_z3',
'reactor_demo_reactor_z4',
];
const REACTOR_MEASUREMENT = REACTOR_MEASUREMENTS[3];
const QUERY_TIMEOUT_MS = 90000;
const POLL_INTERVAL_MS = 3000;
const REQUIRED_DASHBOARD_TITLES = ['Bioreactor Z1', 'Bioreactor Z2', 'Bioreactor Z3', 'Bioreactor Z4', 'Settler S1'];
async function wait(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function fetchJson(url, options = {}) {
const response = await fetch(url, options);
const text = await response.text();
let body = null;
if (text) {
try {
body = JSON.parse(text);
} catch {
body = text;
}
}
return { response, body, text };
}
async function assertReachable() {
const checks = [
[`${NR_URL}/settings`, 'Node-RED'],
[`${INFLUX_URL}/health`, 'InfluxDB'],
[`${GRAFANA_URL}/api/health`, 'Grafana'],
];
for (const [url, label] of checks) {
const { response, text } = await fetchJson(url, {
headers: label === 'Grafana'
? { Authorization: `Basic ${Buffer.from(`${GRAFANA_USER}:${GRAFANA_PASSWORD}`).toString('base64')}` }
: undefined,
});
if (!response.ok) {
throw new Error(`${label} not reachable at ${url} (${response.status}): ${text}`);
}
console.log(`PASS: ${label} reachable`);
}
}
async function deployDemoFlow() {
const flow = JSON.parse(fs.readFileSync(FLOW_FILE, 'utf8'));
const { response, text } = await fetchJson(`${NR_URL}/flows`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Node-RED-Deployment-Type': 'full',
},
body: JSON.stringify(flow),
});
if (!(response.status === 200 || response.status === 204)) {
throw new Error(`Flow deploy failed (${response.status}): ${text}`);
}
console.log(`PASS: Demo flow deployed (${response.status})`);
}
async function queryInfluxCsv(query) {
const response = await fetch(`${INFLUX_URL}/api/v2/query?org=${encodeURIComponent(INFLUX_ORG)}`, {
method: 'POST',
headers: {
Authorization: `Token ${INFLUX_TOKEN}`,
'Content-Type': 'application/json',
Accept: 'application/csv',
},
body: JSON.stringify({ query }),
});
const text = await response.text();
if (!response.ok) {
throw new Error(`Influx query failed (${response.status}): ${text}`);
}
return text;
}
function countCsvDataRows(csvText) {
return csvText
.split('\n')
.map((line) => line.trim())
.filter((line) => line && !line.startsWith('#') && line.includes(','))
.length;
}
async function waitForReactorTelemetry() {
const deadline = Date.now() + QUERY_TIMEOUT_MS;
while (Date.now() < deadline) {
const counts = {};
for (const measurement of REACTOR_MEASUREMENTS) {
const query = `
from(bucket: "${INFLUX_BUCKET}")
|> range(start: -15m)
|> filter(fn: (r) => r._measurement == "${measurement}")
|> limit(n: 20)
`.trim();
counts[measurement] = countCsvDataRows(await queryInfluxCsv(query));
}
const missing = Object.entries(counts)
.filter(([, rows]) => rows === 0)
.map(([measurement]) => measurement);
if (missing.length === 0) {
const summary = Object.entries(counts)
.map(([measurement, rows]) => `${measurement}=${rows}`)
.join(', ');
console.log(`PASS: Reactor telemetry reached InfluxDB (${summary})`);
return;
}
console.log(`WAIT: reactor telemetry not yet present in InfluxDB for ${missing.join(', ')}`);
await wait(POLL_INTERVAL_MS);
}
throw new Error(`Timed out waiting for reactor telemetry measurements ${REACTOR_MEASUREMENTS.join(', ')}`);
}
async function assertGrafanaDatasource() {
const auth = `Basic ${Buffer.from(`${GRAFANA_USER}:${GRAFANA_PASSWORD}`).toString('base64')}`;
const { response, body, text } = await fetchJson(`${GRAFANA_URL}/api/datasources/uid/${GRAFANA_DS_UID}`, {
headers: { Authorization: auth },
});
if (!response.ok) {
throw new Error(`Grafana datasource lookup failed (${response.status}): ${text}`);
}
if (body?.uid !== GRAFANA_DS_UID) {
throw new Error(`Grafana datasource UID mismatch: expected ${GRAFANA_DS_UID}, got ${body?.uid}`);
}
console.log(`PASS: Grafana datasource ${GRAFANA_DS_UID} is present`);
}
async function queryGrafanaDatasource() {
const auth = `Basic ${Buffer.from(`${GRAFANA_USER}:${GRAFANA_PASSWORD}`).toString('base64')}`;
const response = await fetch(`${GRAFANA_URL}/api/ds/query`, {
method: 'POST',
headers: {
Authorization: auth,
'Content-Type': 'application/json',
},
body: JSON.stringify({
from: 'now-15m',
to: 'now',
queries: [
{
refId: 'A',
datasource: { uid: GRAFANA_DS_UID, type: 'influxdb' },
query: `
from(bucket: "${INFLUX_BUCKET}")
|> range(start: -15m)
|> filter(fn: (r) => r._measurement == "${REACTOR_MEASUREMENT}" and r._field == "S_O")
|> last()
`.trim(),
rawQuery: true,
intervalMs: 1000,
maxDataPoints: 100,
}
],
}),
});
const text = await response.text();
if (!response.ok) {
throw new Error(`Grafana datasource query failed (${response.status}): ${text}`);
}
const body = JSON.parse(text);
const frames = body?.results?.A?.frames || [];
if (frames.length === 0) {
throw new Error('Grafana datasource query returned no reactor frames');
}
console.log(`PASS: Grafana can query reactor telemetry through datasource (${frames.length} frame(s))`);
}
async function waitForGrafanaDashboards(timeoutMs = QUERY_TIMEOUT_MS) {
const deadline = Date.now() + timeoutMs;
const auth = `Basic ${Buffer.from(`${GRAFANA_USER}:${GRAFANA_PASSWORD}`).toString('base64')}`;
while (Date.now() < deadline) {
const response = await fetch(`${GRAFANA_URL}/api/search?query=`, {
headers: { Authorization: auth },
});
const text = await response.text();
if (!response.ok) {
throw new Error(`Grafana dashboard search failed (${response.status}): ${text}`);
}
const results = JSON.parse(text);
const titles = new Set(results.map((item) => item.title));
const missing = REQUIRED_DASHBOARD_TITLES.filter((title) => !titles.has(title));
const pumpingStationCount = results.filter((item) => item.title === 'pumpingStation').length;
if (missing.length === 0 && pumpingStationCount >= 3) {
console.log(`PASS: Grafana dashboards created (${REQUIRED_DASHBOARD_TITLES.join(', ')} + ${pumpingStationCount} pumpingStation dashboards)`);
return;
}
const missingParts = [];
if (missing.length > 0) {
missingParts.push(`missing titled dashboards: ${missing.join(', ')}`);
}
if (pumpingStationCount < 3) {
missingParts.push(`pumpingStation dashboards=${pumpingStationCount}`);
}
console.log(`WAIT: Grafana dashboards not ready: ${missingParts.join(' | ')}`);
await wait(POLL_INTERVAL_MS);
}
throw new Error(`Timed out waiting for Grafana dashboards: ${REQUIRED_DASHBOARD_TITLES.join(', ')} and >=3 pumpingStation dashboards`);
}
async function main() {
console.log('=== EVOLV Reactor E2E Round Trip ===');
await assertReachable();
await deployDemoFlow();
console.log('WAIT: allowing Node-RED inject/tick loops to populate telemetry');
await wait(12000);
await waitForReactorTelemetry();
await assertGrafanaDatasource();
await queryGrafanaDatasource();
if (REQUIRE_GRAFANA_DASHBOARDS) {
await waitForGrafanaDashboards();
console.log('PASS: Node-RED -> InfluxDB -> Grafana round trip is working for reactor telemetry and dashboard generation');
return;
}
try {
await waitForGrafanaDashboards(15000);
console.log('PASS: Node-RED -> InfluxDB -> Grafana round trip is working for reactor telemetry and dashboard generation');
} catch (error) {
console.warn(`WARN: Grafana dashboard auto-generation is not ready yet: ${error.message}`);
console.log('PASS: Node-RED -> InfluxDB -> Grafana round trip is working for live reactor telemetry');
}
}
main().catch((error) => {
console.error(`FAIL: ${error.message}`);
process.exit(1);
});