Files
EVOLV/docs/prd/dashboardapi-graph-aware-grafana-generator.md
znetsixe 14140725bc chore: workflow artifacts — research brief + dashboardAPI v2 PRD + submodule bumps
Bumps machineGroupControl (e1e1977) and pumpingStation (ef07f2a) — example
dashboard JSON tweaks committed on each submodule's development branch.

Adds docs/research/ and docs/prd/ for the dashboardAPI v2 graph-aware Grafana
generator workflow (Gitea issues #32-#43). Ignores .prototypes/ — throwaway
spike code lives there per the /prototype skill.
2026-05-26 17:32:20 +02:00

12 KiB

dashboardAPI v2 — graph-aware Grafana dashboard generator

Date: 2026-05-26 · Owner: R&D · Predecessors: /grill-me (in-conversation), docs/research/dashboardapi-graph-aware-grafana-generator.md

One dashboardAPI node in a Node-RED flow auto-generates one Grafana dashboard by walking its child-registration graph, composing per-node-type panel templates, and pushing the result to Grafana via HTTP on every Node-RED deploy.

Problem

Every EVOLV example flow today carries a hand-authored Node-RED Dashboard tab — the active pumpingstation-complete-example flow has 73 ui-* nodes (charts, gauges, text widgets, fan-out function nodes) consuming roughly a third of the flow. Every new example replicates this work, and each one diverges in axis ranges, chart configs, and fan-out logic — so the output side is inconsistent across the 10+ example flows we maintain. The same telemetry already lands in InfluxDB via Port 1 of every node, so Grafana could render it natively, but today each Grafana dashboard is hand-authored JSON (docker/grafana/provisioning/dashboards/pumping-station.json is the only one that exists, frozen at one node type). Result: R&D spends disproportionate time on dashboard plumbing, examples drift, and Grafana — the better readout — is underused.

Goals

  1. Dropping a dashboardAPI node into a flow and deploying produces a complete Grafana dashboard with no hand-authored JSON.
  2. Adding a new EVOLV node instance (e.g. a new measurement child) to a flow adds its panels on the next deploy with zero Grafana edits.
  3. Adding a new EVOLV node type requires only a panel template fragment under nodes/dashboardAPI/src/templates/<softwareType>.json — no changes to the layout engine.
  4. Cross-example consistency: every example flow's Grafana dashboard uses the same panel set, axis conventions, and dashed-bounds rendering for the same node type.
  5. Node-RED Dashboard tab in example flows shrinks to control-only widgets (mode select, operator demand, calibration, signal injection). Target: ≤15 ui-* nodes per example flow.

Non-goals

  • Sub-second feedback latency from operator action → Grafana visible state. End-to-end ≤15s is acceptable; faster is not pursued.
  • Preserving manual Grafana edits across regenerations. Dashboards are single-source-of-truth from dashboardAPI; manual edits are clobbered on next deploy.
  • Per-instance dashboard customization through the Grafana UI. Templates are centralized and code-owned.
  • Supporting non-EVOLV (third-party) Node-RED node types as panel sources.
  • Live runtime regeneration (no deploy). Regen fires on Node-RED deploy events only.
  • Operator (plant-staff) UX. Sole user is R&D until further notice.
  • Replacing the InfluxDB write path. dashboardAPI v2 reuses the existing outputUtils.formatForInflux + influxdbFormatter plumbing unchanged.

Users & scenarios

Sole user: EVOLV R&D team (Rene, Pim, Janneke, Sjoerd, Dieke, Pieter).

  1. New example flow from scratch. When R&D builds a new example for rotatingMachine-complete, they assemble the node graph (pumpingStation + 3 pumps + measurements), drop in one dashboardAPI, connect each top-level parent to it, and deploy. A Grafana dashboard at the dashboardAPI's UID appears within seconds, with rows per parent and panels per child following the centralized templates.
  2. Adding a measurement to an existing flow. When R&D wires a new measurement node as a child of an existing pumpingStation in pumpingstation-complete-example and redeploys, the corresponding pump panel gains a measured series next to its predicted series. No Grafana edit.
  3. Adding a new EVOLV node type. When R&D ships a new node type mixer, they author nodes/dashboardAPI/src/templates/mixer.json (Grafana panel fragment with ${nodeName} substitution tokens) and bump dashboardAPI's package version. Existing dashboardAPI instances pick up mixer-typed children on next deploy.

Requirements

Functional

  1. F-1. dashboardAPI shall subscribe to RED.events.on('flows:started') and, on each event, inspect payload.diff to determine whether any of its own subtree (the dashboardAPI node, its registered children, their registered grandchildren) was affected. If yes, regenerate the dashboard. If no, no-op.
  2. F-2. On regenerate, dashboardAPI shall walk its registered children via ChildRegistrationUtils.getAllChildren(), recurse one level per registered child to discover grandchildren, and produce an ordered list [{softwareType, nodeName, position, children: [...]}, ...].
  3. F-3. For each node in the graph, dashboardAPI shall load the matching template at nodes/dashboardAPI/src/templates/${softwareType}.json and substitute the placeholders ${nodeName}, ${nodeId}, ${parentName}, ${dashboardUid} and any child-list placeholders into the panel JSON.
  4. F-4. The layout engine shall compose templates into a single Grafana dashboard JSON with: one row per top-level child of dashboardAPI; nested rows for grandchildren; sequential gridPos.y offsets so panels don't overlap.
  5. F-5. Parent panels shall not repeat metrics that any of their children's templates already emit. The template format declares each panel's emittedFields so the composer can filter duplicates from the parent's panel set.
  6. F-6. For each child node of type rotatingMachine, the panel set shall include: %control, flow, delta P, any registered measurement child's measured values, and efficiency. Where the node config exposes operating bounds (e.g. min/max flow), those bounds shall be rendered as dashed reference lines (fieldConfig.custom.lineStyle = {fill: "dash", dash: [10,10]} via a byName override) on the same panel as the act value.
  7. F-7. For each child of type measurement registered to a parent that also emits a predicted series for the same quantity, the dashboard shall render two panels side by side (predicted left, measured right). If only predicted exists, render the predicted panel only. If only measured exists, render the measured panel only.
  8. F-8. dashboardAPI shall POST the assembled dashboard to POST {grafanaUrl}/api/dashboards/db with body {dashboard: <json>, overwrite: true, folderUid: <configured>}, using the configured bearer token in Authorization: Bearer <token>. The dashboard.uid shall be deterministic from the dashboardAPI node's Node-RED id.
  9. F-9. On a successful upsert (HTTP 200), dashboardAPI shall log the dashboard URL at info level. On failure (non-2xx, timeout, network error), it shall log at error level with the response body and shall not retry; the next deploy is the retry mechanism.
  10. F-10. Each node emitting a value with operating bounds shall write the bounds as additional Influx fields named <field>.min and <field>.max alongside <field> itself. The dashed-line override matches these by suffix.
  11. F-11. The bearer token shall be stored as a Node-RED encrypted credential, not as a plain defaults field. On node startup, if the legacy plain field exists, it is migrated to the credential store and the plain field is cleared, with one info-level log line per migrated instance.
  12. F-12. dashboardAPI shall expose msg.topic == "regenerate-dashboard" as a manual trigger that bypasses the diff check and forces a regenerate.

Non-functional

  • N-1. Performance. Dashboard composition (graph walk + template merge + JSON build, excluding HTTP roundtrip) shall complete in <500ms for a flow with up to 50 registered children.
  • N-2. Idempotency. Running the regenerate path twice in a row with no intervening graph change produces a byte-identical dashboard JSON.
  • N-3. Security. The bearer token shall never appear in any log line, status update, debug output, or admin endpoint response. Token-bearing HTTP requests shall set TLS verification on when the configured Grafana URL is https://.
  • N-4. Observability. Every regenerate emits a structured log line via the logger shared utility with fields: dashboardUid, childCount, grandchildCount, compositionDurationMs, httpStatus, outcome ∈ {success, http-error, network-error, no-diff}.
  • N-5. Backward compatibility. Existing dashboardAPI instances continue to write to InfluxDB exactly as before. The Grafana-push path is additive and disabled if no grafanaUrl is configured.

Constraints & dependencies

  • Grafana version pinned. docker-compose.yml shall pin to grafana/grafana:11.3.0 (or whatever specific minor exists at first-issue time) instead of latest. The legacy POST /api/dashboards/db endpoint is the target; the Grafana 12 Kubernetes-style API is out of scope. This resolves research O-3.
  • Node-RED runtime events. Depends on RED.events.on('flows:started') firing with a payload.diff shape (added/changed/removed arrays) — undocumented but stable in current Node-RED versions. Verified by prototype before first issue ships.
  • InfluxDB write path unchanged. Reuses existing outputUtils.formatForInflux + influxdbFormatter. No schema migration to existing telemetry.
  • Tag schema. Every Influx field used by a panel must be in the existing emission convention (_measurement = nodeName, _field = type.variant.position.childId).
  • Scaffolding to reuse: ChildRegistrationUtils.getAllChildren() (nodes/generalFunctions/src/helper/childRegistrationUtils.js:104-106), extractChildren() (nodes/dashboardAPI/src/specificClass.js:151-163), grafanaUpsertUrl() (:107-110, URL builder exists, HTTP send missing), BaseNodeAdapter lifecycle pattern.
  • No new npm dependencies for the HTTP path. Use Node's built-in https/http modules.

Success metrics

  1. Hand-authored Grafana JSON in repo = 0. Measured by counting JSON files in docker/grafana/provisioning/dashboards/ minus the dynamically-uploaded ones. Current: 2 (pumping-station.json, coresync-frost-demo.json). Target after rollout: 0 file-based, N dynamic.
  2. ui-* node count per example flow ≤ 15 (down from 73 in the current pumpingstation-complete-example). Measured by grepping examples/*.flow.json after migration.
  3. Time-to-first-dashboard for a new example flow ≤ 1 minute of human work (drop in dashboardAPI, configure URL + token, deploy). Measured by stopwatch on the next example flow that gets built.
  4. Regression coverage: every example flow's dashboard URL returns HTTP 200 and renders without panel errors. Measured by an integration test that hits the Grafana API after deploying each example.

Open questions

  • O-1. flows:started + diff reliability across deploy modes. Source-readable but needs a spike to confirm diff cleanly distinguishes "this dashboardAPI's subtree changed" from "an unrelated flow changed", across full / nodes / flows deploy types. → Resolved by /prototype before issue I-3 (the lifecycle hook issue) starts.
  • O-2. Dashed-line custom.lineStyle rendering against real Influx series. Open Grafana bugs #75259 and #86546 may affect us. → Resolved by /prototype before issue I-5 (rotatingMachine template) starts.
  • O-5 (new). Folder UID handling — does dashboardAPI assume a single Grafana folder for all generated dashboards (configured per-instance), or create per-flow folders? Default: per-instance configured folder UID, optional. If empty, dashboards land in the General folder. → Owner: R&D, deadline: before I-4.

Out of scope (v2 candidates)

  • Per-instance panel customization through the Grafana UI with merge-on-regen.
  • Operator-facing UX (Grafana role/permission management, embedded dashboards in Node-RED).
  • Auto-discovery of measurement units / axis ranges from node config schemas.
  • Multi-Grafana-instance fanout (push the same dashboard to staging + prod).
  • Grafana alerts / notification policies generated from EVOLV alarm definitions.
  • Dashboard versioning / rollback inside Grafana.
  • Template fragments living next to their owning node (decentralized template discovery).