Source-tree mirror of EVOLV.wiki.git refactor (27a42ee on wiki.git): - 7 master pages rewritten with clean design (Home, Architecture, Topology-Patterns, Topic-Conventions, Telemetry, Getting-Started, Glossary). Tables and Mermaid for visuals, gitea alert callouts for warnings, shields badges for metadata only. No emoji as decoration. - Archive.md becomes a removal-changelog pointing readers to git history and to the successor pages. - _Sidebar.md updated to navigate the new flat-name layout. - Concept / finding / manual pages: uniform mini-header (badges + "reference page" callout) added without rewriting domain content. - Every internal link now uses the flat naming that resolves on the live gitea wiki (Concept-ASM-Models, Finding-BEP-..., etc.). On wiki.git: 29 Archive-* pages hard-deleted (the git history preserves them; Archive.md documents the removal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.0 KiB
5.0 KiB
InfluxDB Time-Series Best Practices
Note
Reference page. Maintained for context; not regenerated by code. See Home for current top-level navigation.
Used by:
telemetry-databaseagent,dashboardAPInode Validation: Verified against InfluxDB official documentation (v1, v2, v3)
Tag vs. Field Decision Framework
| Criterion | Use Tag | Use Field |
|---|---|---|
| Queried in WHERE clause frequently | Yes | No |
| Used in GROUP BY | Yes | No |
| Low cardinality (< 100 distinct values) | Yes | Acceptable |
| High cardinality (IDs, timestamps, free text) | Never | Yes |
| Numeric measurement values | No | Yes |
| Needs aggregation (mean, sum, etc.) | No | Yes |
| Node/station/machine identifier | Yes | No |
| Actual sensor reading | No | Yes |
| Setpoint value | No | Yes |
| Quality flag | Depends* | Yes |
*Quality flags: If you have ≤5 quality levels (good/uncertain/bad), a tag is acceptable. If quality is a numeric score, use a field.
EVOLV Tag/Field Convention
Standard Tags (low cardinality, indexed)
locationId — Site identifier (e.g., "wwtp-brabant-01")
nodeType — Node type (e.g., "rotatingMachine", "reactor")
nodeName — Instance name (e.g., "pump-01", "reactor-A")
machineType — Equipment type (e.g., "pump", "blower", "valve")
stationId — Parent station identifier
measurementType — Sensor type (e.g., "flow", "pressure", "temperature")
Standard Fields (not indexed, high cardinality)
value — Primary measurement value
setpoint — Control setpoint
quality — Data quality score (0.0-1.0)
state — Machine state (numeric code)
power — Power consumption (W)
efficiency — Current efficiency (0.0-1.0)
speed — Rotational speed (RPM or fraction)
position — Valve position (0.0-1.0)
Cardinality Management
What Is Cardinality?
Series cardinality = unique combinations of (measurement_name × tag_key_1 × tag_key_2 × ... × tag_key_n)
Cardinality Limits
- InfluxDB v1/v2 (TSM engine): High cardinality degrades query performance and increases memory usage. Keep below ~1M series per database.
- InfluxDB v3: Supports infinite series cardinality (new storage engine), but keeping cardinality low still improves query speed.
Anti-Patterns (NEVER do these)
- Encoding timestamps in tag values
- Using UUIDs or session IDs as tags
- Free-text strings as tags
- Unbounded enum values as tags
- One measurement per sensor (use tags to differentiate instead)
Good Patterns
- Use a single measurement name per data category
- Differentiate by tags, not by measurement name
- Keep tag value sets bounded and predictable
- Document all tag values in a schema registry
Retention Policies
Three-Tier Strategy
| Tier | Retention | Resolution | Purpose |
|---|---|---|---|
| Hot | 7-30 days | Full resolution (1s-10s) | Real-time dashboards, control loops |
| Warm | 90-365 days | Downsampled (1min-5min) | Trending, troubleshooting |
| Cold | 2-10 years | Heavily aggregated (1h-24h) | Compliance reporting, long-term trends |
EVOLV Recommended Defaults
- Port 1 data at full resolution: 30 days
- 1-minute aggregates: 1 year
- 1-hour aggregates: 5 years (matches regulatory retention requirements)
Continuous Queries / Tasks (Downsampling)
InfluxDB v1: Continuous Queries
CREATE CONTINUOUS QUERY "downsample_1m" ON "evolv"
BEGIN
SELECT mean("value") AS "value", max("value") AS "max", min("value") AS "min"
INTO "rp_warm"."downsampled_1m"
FROM "telemetry"
GROUP BY time(1m), *
END
InfluxDB v2: Tasks
option task = {name: "downsample_1m", every: 1m}
from(bucket: "telemetry")
|> range(start: -task.every)
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
|> to(bucket: "telemetry-warm")
Query Performance Tips
- Always filter by time range first — time is the primary index
- Use tag filters in WHERE — tags are indexed, fields are not
- Avoid regex on tag values — use exact matches when possible
- Limit series scanned — filter by specific nodeType/nodeName
- Use aggregation — let the database aggregate rather than fetching raw points
- Batch writes — write in batches of 5,000-10,000 points for optimal throughput
Authoritative References
- InfluxDB Documentation — "Schema Design and Data Layout" (https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/)
- InfluxDB Documentation — "Schema Design Recommendations and Best Practices" (v2/v3)
- InfluxData Blog — "Time Series Data, Cardinality, and InfluxDB"
- InfluxDB Documentation — "Resolve High Series Cardinality" (https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/)
- InfluxData (2023). "InfluxDB Best Practices" — Official technical guides