Bumps machineGroupControl (e1e1977) and pumpingStation (ef07f2a) — example dashboard JSON tweaks committed on each submodule's development branch. Adds docs/research/ and docs/prd/ for the dashboardAPI v2 graph-aware Grafana generator workflow (Gitea issues #32-#43). Ignores .prototypes/ — throwaway spike code lives there per the /prototype skill.
12 KiB
dashboardAPI v2 — graph-aware Grafana dashboard generator
Date: 2026-05-26 · Owner: R&D · Predecessors: /grill-me (in-conversation), docs/research/dashboardapi-graph-aware-grafana-generator.md
One dashboardAPI node in a Node-RED flow auto-generates one Grafana dashboard by walking its child-registration graph, composing per-node-type panel templates, and pushing the result to Grafana via HTTP on every Node-RED deploy.
Problem
Every EVOLV example flow today carries a hand-authored Node-RED Dashboard tab — the active pumpingstation-complete-example flow has 73 ui-* nodes (charts, gauges, text widgets, fan-out function nodes) consuming roughly a third of the flow. Every new example replicates this work, and each one diverges in axis ranges, chart configs, and fan-out logic — so the output side is inconsistent across the 10+ example flows we maintain. The same telemetry already lands in InfluxDB via Port 1 of every node, so Grafana could render it natively, but today each Grafana dashboard is hand-authored JSON (docker/grafana/provisioning/dashboards/pumping-station.json is the only one that exists, frozen at one node type). Result: R&D spends disproportionate time on dashboard plumbing, examples drift, and Grafana — the better readout — is underused.
Goals
- Dropping a
dashboardAPInode into a flow and deploying produces a complete Grafana dashboard with no hand-authored JSON. - Adding a new EVOLV node instance (e.g. a new measurement child) to a flow adds its panels on the next deploy with zero Grafana edits.
- Adding a new EVOLV node type requires only a panel template fragment under
nodes/dashboardAPI/src/templates/<softwareType>.json— no changes to the layout engine. - Cross-example consistency: every example flow's Grafana dashboard uses the same panel set, axis conventions, and dashed-bounds rendering for the same node type.
- Node-RED Dashboard tab in example flows shrinks to control-only widgets (mode select, operator demand, calibration, signal injection). Target: ≤15
ui-*nodes per example flow.
Non-goals
- Sub-second feedback latency from operator action → Grafana visible state. End-to-end ≤15s is acceptable; faster is not pursued.
- Preserving manual Grafana edits across regenerations. Dashboards are single-source-of-truth from dashboardAPI; manual edits are clobbered on next deploy.
- Per-instance dashboard customization through the Grafana UI. Templates are centralized and code-owned.
- Supporting non-EVOLV (third-party) Node-RED node types as panel sources.
- Live runtime regeneration (no deploy). Regen fires on Node-RED deploy events only.
- Operator (plant-staff) UX. Sole user is R&D until further notice.
- Replacing the InfluxDB write path. dashboardAPI v2 reuses the existing
outputUtils.formatForInflux+influxdbFormatterplumbing unchanged.
Users & scenarios
Sole user: EVOLV R&D team (Rene, Pim, Janneke, Sjoerd, Dieke, Pieter).
- New example flow from scratch. When R&D builds a new example for
rotatingMachine-complete, they assemble the node graph (pumpingStation + 3 pumps + measurements), drop in one dashboardAPI, connect each top-level parent to it, and deploy. A Grafana dashboard at the dashboardAPI's UID appears within seconds, with rows per parent and panels per child following the centralized templates. - Adding a measurement to an existing flow. When R&D wires a new measurement node as a child of an existing pumpingStation in
pumpingstation-complete-exampleand redeploys, the corresponding pump panel gains ameasuredseries next to itspredictedseries. No Grafana edit. - Adding a new EVOLV node type. When R&D ships a new node type
mixer, they authornodes/dashboardAPI/src/templates/mixer.json(Grafana panel fragment with${nodeName}substitution tokens) and bump dashboardAPI's package version. Existing dashboardAPI instances pick up mixer-typed children on next deploy.
Requirements
Functional
- F-1.
dashboardAPIshall subscribe toRED.events.on('flows:started')and, on each event, inspectpayload.diffto determine whether any of its own subtree (the dashboardAPI node, its registered children, their registered grandchildren) was affected. If yes, regenerate the dashboard. If no, no-op. - F-2. On regenerate,
dashboardAPIshall walk its registered children viaChildRegistrationUtils.getAllChildren(), recurse one level per registered child to discover grandchildren, and produce an ordered list[{softwareType, nodeName, position, children: [...]}, ...]. - F-3. For each node in the graph,
dashboardAPIshall load the matching template atnodes/dashboardAPI/src/templates/${softwareType}.jsonand substitute the placeholders${nodeName},${nodeId},${parentName},${dashboardUid}and any child-list placeholders into the panel JSON. - F-4. The layout engine shall compose templates into a single Grafana dashboard JSON with: one row per top-level child of dashboardAPI; nested rows for grandchildren; sequential
gridPos.yoffsets so panels don't overlap. - F-5. Parent panels shall not repeat metrics that any of their children's templates already emit. The template format declares each panel's
emittedFieldsso the composer can filter duplicates from the parent's panel set. - F-6. For each child node of type
rotatingMachine, the panel set shall include:%control,flow,delta P, any registered measurement child's measured values, andefficiency. Where the node config exposes operating bounds (e.g. min/max flow), those bounds shall be rendered as dashed reference lines (fieldConfig.custom.lineStyle = {fill: "dash", dash: [10,10]}via abyNameoverride) on the same panel as the act value. - F-7. For each child of type
measurementregistered to a parent that also emits apredictedseries for the same quantity, the dashboard shall render two panels side by side (predicted left, measured right). If onlypredictedexists, render the predicted panel only. If onlymeasuredexists, render the measured panel only. - F-8.
dashboardAPIshall POST the assembled dashboard toPOST {grafanaUrl}/api/dashboards/dbwith body{dashboard: <json>, overwrite: true, folderUid: <configured>}, using the configured bearer token inAuthorization: Bearer <token>. Thedashboard.uidshall be deterministic from the dashboardAPI node's Node-RED id. - F-9. On a successful upsert (HTTP 200),
dashboardAPIshall log the dashboard URL at info level. On failure (non-2xx, timeout, network error), it shall log at error level with the response body and shall not retry; the next deploy is the retry mechanism. - F-10. Each node emitting a value with operating bounds shall write the bounds as additional Influx fields named
<field>.minand<field>.maxalongside<field>itself. The dashed-line override matches these by suffix. - F-11. The bearer token shall be stored as a Node-RED encrypted credential, not as a plain
defaultsfield. On node startup, if the legacy plain field exists, it is migrated to the credential store and the plain field is cleared, with one info-level log line per migrated instance. - F-12.
dashboardAPIshall exposemsg.topic == "regenerate-dashboard"as a manual trigger that bypasses the diff check and forces a regenerate.
Non-functional
- N-1. Performance. Dashboard composition (graph walk + template merge + JSON build, excluding HTTP roundtrip) shall complete in <500ms for a flow with up to 50 registered children.
- N-2. Idempotency. Running the regenerate path twice in a row with no intervening graph change produces a byte-identical dashboard JSON.
- N-3. Security. The bearer token shall never appear in any log line, status update, debug output, or admin endpoint response. Token-bearing HTTP requests shall set TLS verification on when the configured Grafana URL is
https://. - N-4. Observability. Every regenerate emits a structured log line via the
loggershared utility with fields:dashboardUid,childCount,grandchildCount,compositionDurationMs,httpStatus,outcome ∈ {success, http-error, network-error, no-diff}. - N-5. Backward compatibility. Existing dashboardAPI instances continue to write to InfluxDB exactly as before. The Grafana-push path is additive and disabled if no
grafanaUrlis configured.
Constraints & dependencies
- Grafana version pinned.
docker-compose.ymlshall pin tografana/grafana:11.3.0(or whatever specific minor exists at first-issue time) instead oflatest. The legacyPOST /api/dashboards/dbendpoint is the target; the Grafana 12 Kubernetes-style API is out of scope. This resolves research O-3. - Node-RED runtime events. Depends on
RED.events.on('flows:started')firing with apayload.diffshape (added/changed/removed arrays) — undocumented but stable in current Node-RED versions. Verified by prototype before first issue ships. - InfluxDB write path unchanged. Reuses existing
outputUtils.formatForInflux+influxdbFormatter. No schema migration to existing telemetry. - Tag schema. Every Influx field used by a panel must be in the existing emission convention (
_measurement = nodeName,_field = type.variant.position.childId). - Scaffolding to reuse:
ChildRegistrationUtils.getAllChildren()(nodes/generalFunctions/src/helper/childRegistrationUtils.js:104-106),extractChildren()(nodes/dashboardAPI/src/specificClass.js:151-163),grafanaUpsertUrl()(:107-110, URL builder exists, HTTP send missing),BaseNodeAdapterlifecycle pattern. - No new npm dependencies for the HTTP path. Use Node's built-in
https/httpmodules.
Success metrics
- Hand-authored Grafana JSON in repo = 0. Measured by counting JSON files in
docker/grafana/provisioning/dashboards/minus the dynamically-uploaded ones. Current: 2 (pumping-station.json, coresync-frost-demo.json). Target after rollout: 0 file-based, N dynamic. ui-*node count per example flow ≤ 15 (down from 73 in the currentpumpingstation-complete-example). Measured by greppingexamples/*.flow.jsonafter migration.- Time-to-first-dashboard for a new example flow ≤ 1 minute of human work (drop in dashboardAPI, configure URL + token, deploy). Measured by stopwatch on the next example flow that gets built.
- Regression coverage: every example flow's dashboard URL returns HTTP 200 and renders without panel errors. Measured by an integration test that hits the Grafana API after deploying each example.
Open questions
- O-1.
flows:started+diffreliability across deploy modes. Source-readable but needs a spike to confirmdiffcleanly distinguishes "this dashboardAPI's subtree changed" from "an unrelated flow changed", acrossfull/nodes/flowsdeploy types. → Resolved by/prototypebefore issue I-3 (the lifecycle hook issue) starts. - O-2. Dashed-line
custom.lineStylerendering against real Influx series. Open Grafana bugs #75259 and #86546 may affect us. → Resolved by/prototypebefore issue I-5 (rotatingMachine template) starts. - O-5 (new). Folder UID handling — does dashboardAPI assume a single Grafana folder for all generated dashboards (configured per-instance), or create per-flow folders? Default: per-instance configured folder UID, optional. If empty, dashboards land in the General folder. → Owner: R&D, deadline: before I-4.
Out of scope (v2 candidates)
- Per-instance panel customization through the Grafana UI with merge-on-regen.
- Operator-facing UX (Grafana role/permission management, embedded dashboards in Node-RED).
- Auto-discovery of measurement units / axis ranges from node config schemas.
- Multi-Grafana-instance fanout (push the same dashboard to staging + prod).
- Grafana alerts / notification policies generated from EVOLV alarm definitions.
- Dashboard versioning / rollback inside Grafana.
- Template fragments living next to their owning node (decentralized template discovery).