### eval/ (scenario-based evaluation)
Complements the unit tests under test/basic. Scenarios fluctuate inputs
over simulated time, record every tick to JSONL, print a summary
table + event log, and check expectations. Complementary to unit
tests — these answer "how does the system respond to this input
profile" rather than "is this function correct".
- eval/run.js — driver; monkey-patches Date.now so the
volume integrator ticks at 1 s/iter
regardless of wall-clock
- eval/scenarios/ — one file per scenario
- levelbased-steady.js — constant inflow, demand converges
- levelbased-storm.js — inflow surge, demand saturates
- safety-dry-run-trip.js — manual mode, empty basin, safety trips
- eval/formatters/table.js — ASCII summary of sampled ticks
- eval/logs/ — per-scenario JSONL output (one line per tick)
- eval/README.md — usage + scenario file shape + how to pipe
into InfluxDB/Grafana
All three starter scenarios PASS with their expectations.
### wiki/modes/ (tier template pages)
The levelbased page templated Tier-1 modes (static transfer function).
Added worked examples for the other two tiers so all mode pages share
a common skeleton and new modes have something concrete to imitate:
- flowbased.md — Tier 2 (PID on measured outflow)
- powerbased.md — Tier 2 (levelbased curve clipped by grid power budget)
- mpc.md — Tier 3 (optimisation + forecast; block diagram +
scenario time-series instead of a fixed curve)
- modes/README.md — updated with the three-tier classification table
and diagram-type-per-tier guidance
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.7 KiB
Evaluation harness
Scenario-based evaluation for pumpingStation. Each scenario scripts a stream of inputs against a configured station, ticks the simulator at 1 s resolution, records every state, and prints a summary + event log + expectation check. Separate from unit tests (test/) — those verify individual pieces of logic in isolation; scenarios check end-to-end behaviour over time with realistic input trajectories.
Run
# One scenario
node eval/run.js levelbased-steady
# All scenarios at once
node eval/run.js --all
Per-tick records are written to eval/logs/<scenario>.jsonl for post-hoc analysis (e.g. streaming into InfluxDB for Grafana, or pandas / jq for one-off exploration).
Scenario file shape
// eval/scenarios/<name>.js
module.exports = {
name: 'scenario-identifier',
description: 'one sentence — what the scenario is testing',
durationSec: 1200,
config: { /* PumpingStation config, same shape as nodeClass builds */ },
setup: async (ps) => {
// Optional. Wire fake MGCs, calibrate initial level, etc.
},
inputs: (t, ps) => {
// Called every tick (t in seconds). Drive inflow, mode changes,
// operator actions, etc.
ps.setManualInflow(0.005, Date.now(), 'm3/s');
},
expectations: [
{ name: 'no safety trips', type: 'safety_trips_eq', value: 0 },
{ name: 'level stays below overflow', type: 'max_level_bounded', value: 4.5 },
],
};
Supported expectation types
| Type | Semantics |
|---|---|
max_level_bounded |
max level across the run must be ≤ value |
min_level_bounded |
min level across the run must be ≥ value |
max_demand_bounded |
max percControl must be ≤ value |
safety_trips_eq |
total ticks with safetyActive must equal value |
safety_trips_gt |
total ticks with safetyActive must be > value |
end_state_eq |
final record's field must equal value |
threshold_issues_eq |
startup guardrail issue count must equal value |
Add new expectation types in run.js (evalExpectation).
Output
Example run:
═══ Scenario: levelbased-steady ═══
Constant sewer inflow below pump capacity; level converges inside the RAMP zone with demand matching inflow.
Duration: 1200s, 1s ticks
─── Samples (every 10%) ───
t(s) level(m) vol(m3) dir netFlow(m3/s) src demand safe
────────────────────────────────────────────────────────────────────────────────────────
0 2.00 20.00 steady 0 — 0% ·
120 2.64 26.40 draining -0.0026 predicted 62% ·
240 2.30 23.00 draining -0.0004 predicted 68% ·
...
─── Events (3) ───
t= 15s direction steady → filling
t= 134s direction filling → draining
─── Metrics ───
level min=2.00 max=2.73 end=2.33 m
percControl min=0% max=73% end=66%
safety trips=0 ticks
threshold issues=0 at startup
─── Expectations ───
✓ no safety trips: 0 ticks with safetyActive (expected 0)
✓ level stays below overflow: max level = 2.73 m (bound: ≤ 4.5)
✓ level stays above outflow: min level = 2.00 m (bound: ≥ 0.2)
✓ no threshold issues on init: 0 threshold issues at startup (expected 0)
Log: eval/logs/levelbased-steady.jsonl (1200 records)
✅ PASS
Why separate from test/?
test/ |
eval/ |
|
|---|---|---|
| runner | node --test |
node eval/run.js |
| scope | one function / small behaviour | end-to-end scenario over time |
| duration | milliseconds | seconds to minutes (simulated) |
| assertion style | tight, exact (assert.equal) |
tolerance / bounds / event counts |
| output | TAP | summary table + JSONL for analysis |
| purpose | catch regressions | analyse how the system responds to input |
Unit tests live under test/basic/, test/integration/, test/edge/. Scenarios live here under eval/scenarios/.
Sending logs to Grafana (optional)
The JSONL output has one record per tick. To stream into InfluxDB for Grafana viewing, adapt a small consumer:
jq -c '{
measurement: "pumping_station_eval",
tags: { scenario: "'$SCENARIO'" },
fields: { level: .level, volume: .volume, demand: .percControl, safety: (.safetyActive|if . then 1 else 0 end) },
timestamp: (.t | tonumber | . * 1000000000)
}' eval/logs/$SCENARIO.jsonl \
| influx write --bucket=telemetry ...
The t field is seconds from the scenario start (not wall-clock), so point the Grafana time range at now() - $duration after running.