RnD/EVOLV

Files

znetsixe 6a6c04d34b Migrate to new Gitea instance (gitea.wbd-rd.nl)

- Update all submodule URLs from gitea.centraal.wbd-rd.nl to gitea.wbd-rd.nl
- Add settler as proper submodule in .gitmodules
- Add agent skills, function anchors, decisions, and improvements
- Add Docker configuration and scripts
- Add manuals and third_party docs
- Update .gitignore with secrets and build artifacts
- Remove stale .tgz build artifact

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-04 21:07:04 +01:00

4.7 KiB

Raw Blame History

InfluxDB Time-Series Best Practices

Used by: telemetry-database agent, dashboardAPI node Validation: Verified against InfluxDB official documentation (v1, v2, v3)

Tag vs. Field Decision Framework

Criterion	Use Tag	Use Field
Queried in WHERE clause frequently	Yes	No
Used in GROUP BY	Yes	No
Low cardinality (< 100 distinct values)	Yes	Acceptable
High cardinality (IDs, timestamps, free text)	Never	Yes
Numeric measurement values	No	Yes
Needs aggregation (mean, sum, etc.)	No	Yes
Node/station/machine identifier	Yes	No
Actual sensor reading	No	Yes
Setpoint value	No	Yes
Quality flag	Depends*	Yes

*Quality flags: If you have ≤5 quality levels (good/uncertain/bad), a tag is acceptable. If quality is a numeric score, use a field.

EVOLV Tag/Field Convention

Standard Tags (low cardinality, indexed)

locationId    — Site identifier (e.g., "wwtp-brabant-01")
nodeType      — Node type (e.g., "rotatingMachine", "reactor")
nodeName      — Instance name (e.g., "pump-01", "reactor-A")
machineType   — Equipment type (e.g., "pump", "blower", "valve")
stationId     — Parent station identifier
measurementType — Sensor type (e.g., "flow", "pressure", "temperature")

Standard Fields (not indexed, high cardinality)

value         — Primary measurement value
setpoint      — Control setpoint
quality       — Data quality score (0.0-1.0)
state         — Machine state (numeric code)
power         — Power consumption (W)
efficiency    — Current efficiency (0.0-1.0)
speed         — Rotational speed (RPM or fraction)
position      — Valve position (0.0-1.0)

Cardinality Management

What Is Cardinality?

Series cardinality = unique combinations of (measurement_name × tag_key_1 × tag_key_2 × ... × tag_key_n)

Cardinality Limits

InfluxDB v1/v2 (TSM engine): High cardinality degrades query performance and increases memory usage. Keep below ~1M series per database.
InfluxDB v3: Supports infinite series cardinality (new storage engine), but keeping cardinality low still improves query speed.

Anti-Patterns (NEVER do these)

Encoding timestamps in tag values
Using UUIDs or session IDs as tags
Free-text strings as tags
Unbounded enum values as tags
One measurement per sensor (use tags to differentiate instead)

Good Patterns

Use a single measurement name per data category
Differentiate by tags, not by measurement name
Keep tag value sets bounded and predictable
Document all tag values in a schema registry

Retention Policies

Three-Tier Strategy

Tier	Retention	Resolution	Purpose
Hot	7-30 days	Full resolution (1s-10s)	Real-time dashboards, control loops
Warm	90-365 days	Downsampled (1min-5min)	Trending, troubleshooting
Cold	2-10 years	Heavily aggregated (1h-24h)	Compliance reporting, long-term trends

EVOLV Recommended Defaults

Port 1 data at full resolution: 30 days
1-minute aggregates: 1 year
1-hour aggregates: 5 years (matches regulatory retention requirements)

Continuous Queries / Tasks (Downsampling)

InfluxDB v1: Continuous Queries

CREATE CONTINUOUS QUERY "downsample_1m" ON "evolv"
BEGIN
  SELECT mean("value") AS "value", max("value") AS "max", min("value") AS "min"
  INTO "rp_warm"."downsampled_1m"
  FROM "telemetry"
  GROUP BY time(1m), *
END

InfluxDB v2: Tasks

option task = {name: "downsample_1m", every: 1m}

from(bucket: "telemetry")
  |> range(start: -task.every)
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
  |> to(bucket: "telemetry-warm")

Query Performance Tips

Always filter by time range first — time is the primary index
Use tag filters in WHERE — tags are indexed, fields are not
Avoid regex on tag values — use exact matches when possible
Limit series scanned — filter by specific nodeType/nodeName
Use aggregation — let the database aggregate rather than fetching raw points
Batch writes — write in batches of 5,000-10,000 points for optimal throughput

Authoritative References

InfluxDB Documentation — "Schema Design and Data Layout" (https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/)
InfluxDB Documentation — "Schema Design Recommendations and Best Practices" (v2/v3)
InfluxData Blog — "Time Series Data, Cardinality, and InfluxDB"
InfluxDB Documentation — "Resolve High Series Cardinality" (https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/)
InfluxData (2023). "InfluxDB Best Practices" — Official technical guides

4.7 KiB Raw Blame History Unescape Escape