chore: workflow artifacts — research brief + dashboardAPI v2 PRD + submodule bumps
Bumps machineGroupControl (e1e1977) and pumpingStation (ef07f2a) — example dashboard JSON tweaks committed on each submodule's development branch. Adds docs/research/ and docs/prd/ for the dashboardAPI v2 graph-aware Grafana generator workflow (Gitea issues #32-#43). Ignores .prototypes/ — throwaway spike code lives there per the /prototype skill.
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -20,3 +20,4 @@ tools/.env
|
|||||||
.repo-mem/
|
.repo-mem/
|
||||||
.codex
|
.codex
|
||||||
CLAUDE.local.md
|
CLAUDE.local.md
|
||||||
|
.prototypes/
|
||||||
|
|||||||
82
docs/prd/dashboardapi-graph-aware-grafana-generator.md
Normal file
82
docs/prd/dashboardapi-graph-aware-grafana-generator.md
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
# dashboardAPI v2 — graph-aware Grafana dashboard generator
|
||||||
|
|
||||||
|
_Date: 2026-05-26 · Owner: R&D · Predecessors: `/grill-me` (in-conversation), [`docs/research/dashboardapi-graph-aware-grafana-generator.md`](../research/dashboardapi-graph-aware-grafana-generator.md)_
|
||||||
|
|
||||||
|
One `dashboardAPI` node in a Node-RED flow auto-generates one Grafana dashboard by walking its child-registration graph, composing per-node-type panel templates, and pushing the result to Grafana via HTTP on every Node-RED deploy.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
Every EVOLV example flow today carries a hand-authored Node-RED Dashboard tab — the active `pumpingstation-complete-example` flow has 73 `ui-*` nodes (charts, gauges, text widgets, fan-out function nodes) consuming roughly a third of the flow. Every new example replicates this work, and each one diverges in axis ranges, chart configs, and fan-out logic — so the output side is inconsistent across the 10+ example flows we maintain. The same telemetry already lands in InfluxDB via Port 1 of every node, so Grafana could render it natively, but today each Grafana dashboard is hand-authored JSON (`docker/grafana/provisioning/dashboards/pumping-station.json` is the only one that exists, frozen at one node type). Result: R&D spends disproportionate time on dashboard plumbing, examples drift, and Grafana — the better readout — is underused.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
1. Dropping a `dashboardAPI` node into a flow and deploying produces a complete Grafana dashboard with no hand-authored JSON.
|
||||||
|
2. Adding a new EVOLV node *instance* (e.g. a new measurement child) to a flow adds its panels on the next deploy with zero Grafana edits.
|
||||||
|
3. Adding a new EVOLV node *type* requires only a panel template fragment under `nodes/dashboardAPI/src/templates/<softwareType>.json` — no changes to the layout engine.
|
||||||
|
4. Cross-example consistency: every example flow's Grafana dashboard uses the same panel set, axis conventions, and dashed-bounds rendering for the same node type.
|
||||||
|
5. Node-RED Dashboard tab in example flows shrinks to control-only widgets (mode select, operator demand, calibration, signal injection). Target: ≤15 `ui-*` nodes per example flow.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
- Sub-second feedback latency from operator action → Grafana visible state. End-to-end ≤15s is acceptable; faster is not pursued.
|
||||||
|
- Preserving manual Grafana edits across regenerations. Dashboards are single-source-of-truth from dashboardAPI; manual edits are clobbered on next deploy.
|
||||||
|
- Per-instance dashboard customization through the Grafana UI. Templates are centralized and code-owned.
|
||||||
|
- Supporting non-EVOLV (third-party) Node-RED node types as panel sources.
|
||||||
|
- Live runtime regeneration (no deploy). Regen fires on Node-RED deploy events only.
|
||||||
|
- Operator (plant-staff) UX. Sole user is R&D until further notice.
|
||||||
|
- Replacing the InfluxDB write path. dashboardAPI v2 reuses the existing `outputUtils.formatForInflux` + `influxdbFormatter` plumbing unchanged.
|
||||||
|
|
||||||
|
## Users & scenarios
|
||||||
|
Sole user: EVOLV R&D team (Rene, Pim, Janneke, Sjoerd, Dieke, Pieter).
|
||||||
|
|
||||||
|
1. **New example flow from scratch.** When R&D builds a new example for `rotatingMachine-complete`, they assemble the node graph (pumpingStation + 3 pumps + measurements), drop in one dashboardAPI, connect each top-level parent to it, and deploy. A Grafana dashboard at the dashboardAPI's UID appears within seconds, with rows per parent and panels per child following the centralized templates.
|
||||||
|
2. **Adding a measurement to an existing flow.** When R&D wires a new measurement node as a child of an existing pumpingStation in `pumpingstation-complete-example` and redeploys, the corresponding pump panel gains a `measured` series next to its `predicted` series. No Grafana edit.
|
||||||
|
3. **Adding a new EVOLV node type.** When R&D ships a new node type `mixer`, they author `nodes/dashboardAPI/src/templates/mixer.json` (Grafana panel fragment with `${nodeName}` substitution tokens) and bump dashboardAPI's package version. Existing dashboardAPI instances pick up mixer-typed children on next deploy.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
### Functional
|
||||||
|
1. **F-1.** `dashboardAPI` shall subscribe to `RED.events.on('flows:started')` and, on each event, inspect `payload.diff` to determine whether any of its own subtree (the dashboardAPI node, its registered children, their registered grandchildren) was affected. If yes, regenerate the dashboard. If no, no-op.
|
||||||
|
2. **F-2.** On regenerate, `dashboardAPI` shall walk its registered children via `ChildRegistrationUtils.getAllChildren()`, recurse one level per registered child to discover grandchildren, and produce an ordered list `[{softwareType, nodeName, position, children: [...]}, ...]`.
|
||||||
|
3. **F-3.** For each node in the graph, `dashboardAPI` shall load the matching template at `nodes/dashboardAPI/src/templates/${softwareType}.json` and substitute the placeholders `${nodeName}`, `${nodeId}`, `${parentName}`, `${dashboardUid}` and any child-list placeholders into the panel JSON.
|
||||||
|
4. **F-4.** The layout engine shall compose templates into a single Grafana dashboard JSON with: one row per top-level child of dashboardAPI; nested rows for grandchildren; sequential `gridPos.y` offsets so panels don't overlap.
|
||||||
|
5. **F-5.** Parent panels shall **not** repeat metrics that any of their children's templates already emit. The template format declares each panel's `emittedFields` so the composer can filter duplicates from the parent's panel set.
|
||||||
|
6. **F-6.** For each child node of type `rotatingMachine`, the panel set shall include: `%control`, `flow`, `delta P`, any registered measurement child's measured values, and `efficiency`. Where the node config exposes operating bounds (e.g. min/max flow), those bounds shall be rendered as dashed reference lines (`fieldConfig.custom.lineStyle = {fill: "dash", dash: [10,10]}` via a `byName` override) on the same panel as the act value.
|
||||||
|
7. **F-7.** For each child of type `measurement` registered to a parent that also emits a `predicted` series for the same quantity, the dashboard shall render two panels side by side (predicted left, measured right). If only `predicted` exists, render the predicted panel only. If only `measured` exists, render the measured panel only.
|
||||||
|
8. **F-8.** `dashboardAPI` shall POST the assembled dashboard to `POST {grafanaUrl}/api/dashboards/db` with body `{dashboard: <json>, overwrite: true, folderUid: <configured>}`, using the configured bearer token in `Authorization: Bearer <token>`. The `dashboard.uid` shall be deterministic from the dashboardAPI node's Node-RED id.
|
||||||
|
9. **F-9.** On a successful upsert (HTTP 200), `dashboardAPI` shall log the dashboard URL at info level. On failure (non-2xx, timeout, network error), it shall log at error level with the response body and shall **not** retry; the next deploy is the retry mechanism.
|
||||||
|
10. **F-10.** Each node emitting a value with operating bounds shall write the bounds as additional Influx fields named `<field>.min` and `<field>.max` alongside `<field>` itself. The dashed-line override matches these by suffix.
|
||||||
|
11. **F-11.** The bearer token shall be stored as a Node-RED encrypted credential, not as a plain `defaults` field. On node startup, if the legacy plain field exists, it is migrated to the credential store and the plain field is cleared, with one info-level log line per migrated instance.
|
||||||
|
12. **F-12.** `dashboardAPI` shall expose `msg.topic == "regenerate-dashboard"` as a manual trigger that bypasses the diff check and forces a regenerate.
|
||||||
|
|
||||||
|
### Non-functional
|
||||||
|
- **N-1. Performance.** Dashboard composition (graph walk + template merge + JSON build, excluding HTTP roundtrip) shall complete in <500ms for a flow with up to 50 registered children.
|
||||||
|
- **N-2. Idempotency.** Running the regenerate path twice in a row with no intervening graph change produces a byte-identical dashboard JSON.
|
||||||
|
- **N-3. Security.** The bearer token shall never appear in any log line, status update, debug output, or admin endpoint response. Token-bearing HTTP requests shall set TLS verification on when the configured Grafana URL is `https://`.
|
||||||
|
- **N-4. Observability.** Every regenerate emits a structured log line via the `logger` shared utility with fields: `dashboardUid`, `childCount`, `grandchildCount`, `compositionDurationMs`, `httpStatus`, `outcome ∈ {success, http-error, network-error, no-diff}`.
|
||||||
|
- **N-5. Backward compatibility.** Existing dashboardAPI instances continue to write to InfluxDB exactly as before. The Grafana-push path is additive and disabled if no `grafanaUrl` is configured.
|
||||||
|
|
||||||
|
## Constraints & dependencies
|
||||||
|
- **Grafana version pinned.** `docker-compose.yml` shall pin to `grafana/grafana:11.3.0` (or whatever specific minor exists at first-issue time) instead of `latest`. The legacy `POST /api/dashboards/db` endpoint is the target; the Grafana 12 Kubernetes-style API is out of scope. This resolves research **O-3**.
|
||||||
|
- **Node-RED runtime events.** Depends on `RED.events.on('flows:started')` firing with a `payload.diff` shape (added/changed/removed arrays) — undocumented but stable in current Node-RED versions. Verified by prototype before first issue ships.
|
||||||
|
- **InfluxDB write path unchanged.** Reuses existing `outputUtils.formatForInflux` + `influxdbFormatter`. No schema migration to existing telemetry.
|
||||||
|
- **Tag schema.** Every Influx field used by a panel must be in the existing emission convention (`_measurement = nodeName`, `_field = type.variant.position.childId`).
|
||||||
|
- **Scaffolding to reuse:** `ChildRegistrationUtils.getAllChildren()` (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:104-106`), `extractChildren()` (`nodes/dashboardAPI/src/specificClass.js:151-163`), `grafanaUpsertUrl()` (`:107-110`, URL builder exists, HTTP send missing), `BaseNodeAdapter` lifecycle pattern.
|
||||||
|
- **No new npm dependencies** for the HTTP path. Use Node's built-in `https`/`http` modules.
|
||||||
|
|
||||||
|
## Success metrics
|
||||||
|
1. **Hand-authored Grafana JSON in repo = 0.** Measured by counting JSON files in `docker/grafana/provisioning/dashboards/` minus the dynamically-uploaded ones. Current: 2 (pumping-station.json, coresync-frost-demo.json). Target after rollout: 0 file-based, N dynamic.
|
||||||
|
2. **`ui-*` node count per example flow ≤ 15** (down from 73 in the current `pumpingstation-complete-example`). Measured by grepping `examples/*.flow.json` after migration.
|
||||||
|
3. **Time-to-first-dashboard for a new example flow ≤ 1 minute of human work** (drop in dashboardAPI, configure URL + token, deploy). Measured by stopwatch on the next example flow that gets built.
|
||||||
|
4. **Regression coverage:** every example flow's dashboard URL returns HTTP 200 and renders without panel errors. Measured by an integration test that hits the Grafana API after deploying each example.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
- **O-1. `flows:started` + `diff` reliability across deploy modes.** Source-readable but needs a spike to confirm `diff` cleanly distinguishes "this dashboardAPI's subtree changed" from "an unrelated flow changed", across `full` / `nodes` / `flows` deploy types. → Resolved by `/prototype` before issue I-3 (the lifecycle hook issue) starts.
|
||||||
|
- **O-2. Dashed-line `custom.lineStyle` rendering against real Influx series.** Open Grafana bugs [#75259](https://github.com/grafana/grafana/issues/75259) and [#86546](https://github.com/grafana/grafana/issues/86546) may affect us. → Resolved by `/prototype` before issue I-5 (rotatingMachine template) starts.
|
||||||
|
- **O-5 (new).** Folder UID handling — does dashboardAPI assume a single Grafana folder for all generated dashboards (configured per-instance), or create per-flow folders? Default: per-instance configured folder UID, optional. If empty, dashboards land in the General folder. → Owner: R&D, deadline: before I-4.
|
||||||
|
|
||||||
|
## Out of scope (v2 candidates)
|
||||||
|
- Per-instance panel customization through the Grafana UI with merge-on-regen.
|
||||||
|
- Operator-facing UX (Grafana role/permission management, embedded dashboards in Node-RED).
|
||||||
|
- Auto-discovery of measurement units / axis ranges from node config schemas.
|
||||||
|
- Multi-Grafana-instance fanout (push the same dashboard to staging + prod).
|
||||||
|
- Grafana alerts / notification policies generated from EVOLV alarm definitions.
|
||||||
|
- Dashboard versioning / rollback inside Grafana.
|
||||||
|
- Template fragments living next to their owning node (decentralized template discovery).
|
||||||
56
docs/research/dashboardapi-graph-aware-grafana-generator.md
Normal file
56
docs/research/dashboardapi-graph-aware-grafana-generator.md
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
# Research brief: graph-aware Grafana dashboard generator in dashboardAPI
|
||||||
|
|
||||||
|
_Date: 2026-05-26_
|
||||||
|
_Context: follows `/grill-me` session that locked design constraints; feeds into `/prd`._
|
||||||
|
|
||||||
|
## Questions
|
||||||
|
1. Node-RED lifecycle: how does a custom node reliably detect "deploy complete" across deploy types?
|
||||||
|
2. Prior art: existing Node-RED → Grafana auto-dashboard generators
|
||||||
|
3. Grafana HTTP API: idempotent dashboard updates by UID, version conflicts, RBAC
|
||||||
|
4. Dynamic min/max envelope pattern: dashed reference lines that vary over time
|
||||||
|
5. EVOLV-internal scaffolding already in place
|
||||||
|
|
||||||
|
## Design constraints already settled in `/grill-me`
|
||||||
|
1. dashboardAPI = dashboard **generator**, not just an InfluxDB writer.
|
||||||
|
2. One dashboardAPI instance = one Grafana dashboard. Multiple instances coexist.
|
||||||
|
3. Single source of truth: regen on Node-RED deploy **clobbers** manual Grafana edits.
|
||||||
|
4. Trigger: HTTP API push from dashboardAPI to Grafana, fired on Node-RED deploy.
|
||||||
|
5. Auth: per-flow Grafana service-account token.
|
||||||
|
6. Templates centralized in `nodes/dashboardAPI/src/templates/` per node type.
|
||||||
|
7. Per-instance `_measurement` = node name (already in `influxdbFormatter`).
|
||||||
|
8. **No data duplication** between parent and child panels (MGC shows group-level only).
|
||||||
|
9. Predicted-vs-measured = 2 panels side by side; predicted only when no measured registered.
|
||||||
|
10. Per-pump panel set: %control / flow / delta P / measured-from-children / efficiency / dashed dynamic bounds.
|
||||||
|
11. Static config bounds → **dashed reference lines** that follow the live operating envelope (top/bottom dashed + act value).
|
||||||
|
|
||||||
|
## What's already in this codebase
|
||||||
|
- **Child registration is fully graph-aware.** `ChildRegistrationUtils` keeps a `Map<id, {child, softwareType, position, registeredAt}>` with type-aware accessors `getAllChildren()`, `getChildById()`, `getChildrenOfType()`. (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:19-106`)
|
||||||
|
- **dashboardAPI already iterates its children.** `extractChildren()` reads `nodeSource.childRegistrationUtils.registeredChildren.values()`. (`nodes/dashboardAPI/src/specificClass.js:151-163`)
|
||||||
|
- **Grafana upsert URL is already constructed but not yet dispatched.** `grafanaUpsertUrl()` builds the target URL — the HTTP send is missing. (`nodes/dashboardAPI/src/specificClass.js:107-110`)
|
||||||
|
- **InfluxDB schema is `measurement: nodeName`, tags from flattened config** (id, softwareType, role, positionVsParent, uuid, tagCode, geoLocation, category, type, model, unit). (`nodes/generalFunctions/src/helper/outputUtils.js:44,99-117`; `formatters/influxdbFormatter.js:12-20`)
|
||||||
|
- **Lifecycle hooks: only `node.on('close')` and `node.on('input')` are used.** No EVOLV node currently subscribes to `RED.events.on('flows:started')` or similar — net-new wiring. (`nodes/generalFunctions/src/nodered/BaseNodeAdapter.js:164,184`)
|
||||||
|
- **dashboardAPI's bearer token is stored as a plain `defaults` field, NOT as a Node-RED `credentials:` block** — so it's not encrypted at rest today. (`nodes/dashboardAPI/dashboardAPI.html:15-16`; `src/nodeClass.js:38-42`) **Contradicts the grilling assumption** that "the existing InfluxDB credentials path" is already in place — it isn't.
|
||||||
|
- **No outbound external HTTPS pattern exists anywhere in EVOLV nodes.** Net-new code path.
|
||||||
|
|
||||||
|
## External options
|
||||||
|
- **Legacy Grafana API (`POST /api/dashboards/db` with `overwrite: true`).** Skips version + uid-uniqueness checks → idempotent. Returns `412 Precondition Failed` on stale version when `overwrite=false`. Minimum RBAC: `dashboards:write` scoped to a folder. ([docs](https://grafana.com/docs/grafana/latest/developers/http_api/dashboard/))
|
||||||
|
- **Grafana 12 Kubernetes-style API (`/apis/dashboard.grafana.app/v1/...`).** Returns `409 Conflict` instead of `412`. Newer but couples integration to Grafana 12+.
|
||||||
|
- **`flows:started` runtime event** fires on every deploy (full / nodes / flows) with `{type, diff}` payload. De-dupe by inspecting `diff.added/changed/removed`. Runtime events are undocumented — must read source. (Node-RED `packages/.../runtime/lib/flows/index.js`)
|
||||||
|
- **`nodes-started` event is deprecated** — use `flows:started`.
|
||||||
|
- **Dashed-line dynamic bands:** the *only* path that works today is emitting min/max as separate Influx fields + applying `fieldConfig.overrides[].properties[].id = "custom.lineStyle"` with `{fill: "dash", dash: [10,10]}`. Per-series override via `byName` matcher.
|
||||||
|
- **Grafana thresholds are static-only** (open issue [grafana/grafana#115398](https://github.com/grafana/grafana/issues/115398) — Needs Prioritisation). Dead end for time-varying bands.
|
||||||
|
|
||||||
|
## Prior art
|
||||||
|
- **No relevant prior art found.** Every "node-red + grafana" tutorial puts Influx in the middle and hand-builds dashboards. No npm package pushes Grafana dashboards from Node-RED. Greenfield lane.
|
||||||
|
- **Grafana Foundation SDK / dashboards-as-code** ([docs](https://grafana.com/docs/grafana/latest/as-code/observability-as-code/foundation-sdk/)) — assumes out-of-band CI generation, not a live Node-RED instance.
|
||||||
|
- **Operating-envelope plotting in Grafana** — [community thread 57225](https://community.grafana.com/t/how-to-plot-graph-using-upper-and-lower-bound/57225) asks the exact question, no accepted answer.
|
||||||
|
- **Known Grafana bugs around `custom.lineStyle`:** [#75259](https://github.com/grafana/grafana/issues/75259) (transforms) and [#86546](https://github.com/grafana/grafana/issues/86546) (overlapping dashed → solid).
|
||||||
|
|
||||||
|
## Open unknowns
|
||||||
|
- **(O-1) `flows:started` + `diff` reliability.** Does `diff` cleanly distinguish "this dashboardAPI's flow changed" from "an unrelated flow changed" across all three deploy modes? Source-readable but needs an actual spike to verify edge cases (e.g. a `Modified Nodes` deploy that adds a child measurement to a pumpingStation registered to a dashboardAPI in a different tab). → **Candidate for `/prototype`.**
|
||||||
|
- **(O-2) Dashed-line rendering against real Influx series.** Two open Grafana bugs ([#75259](https://github.com/grafana/grafana/issues/75259), [#86546](https://github.com/grafana/grafana/issues/86546)) affect `custom.lineStyle`. Untested whether either bites with EVOLV's emission pattern. → **Candidate for `/prototype`.**
|
||||||
|
- **(O-3) Legacy `/api/dashboards/db` vs v12 K8s API.** Which to commit to? Locks integration to a Grafana version family. Local stack uses `grafana/grafana:latest` — version drifts on `docker compose pull`. → PRD-time decision; pin Grafana image.
|
||||||
|
- **(O-4) Bearer-token storage migration.** Assumption that "follow existing creds pattern" doesn't hold — dashboardAPI stores it as plain config today. Need to migrate to Node-RED `credentials:` block. Risk: token currently sitting in `flow.json` of users' existing flows. → PRD-time decision; migration step in first issue.
|
||||||
|
|
||||||
|
## Recommended next step
|
||||||
|
`/prd` — commit the design, resolve O-3 and O-4 explicitly, and queue O-1 and O-2 for `/prototype` before the first issue ships.
|
||||||
Submodule nodes/machineGroupControl updated: ddf2b07424...e1e1977139
Submodule nodes/pumpingStation updated: 2d68a4f504...ef07f2a5b2
Reference in New Issue
Block a user