Files
EVOLV/wiki/concepts/influxdb-schema-design.md
znetsixe 48f790d123
Some checks failed
CI / lint-and-test (push) Has been cancelled
chore: clean up superproject structure
Move content to correct locations:
- AGENTS.md → .agents/AGENTS.md (with orchestrator reference update)
- third_party/docs/ (8 reference docs) → wiki/concepts/
- manuals/ (12 Node-RED docs) → wiki/manuals/

Delete 23 unreferenced one-off scripts from scripts/ (keeping 5 active).
Delete stale Dockerfile.e2e, docker-compose.e2e.yml, test/e2e/.
Remove empty third_party/ directory.

Root is now: README, CLAUDE.md, LICENSE, package.json, Makefile,
Dockerfile, docker-compose.yml, docker/, scripts/ (5), nodes/, wiki/,
plus dotfiles (.agents, .claude, .gitea).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:01:04 +02:00

123 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# InfluxDB Time-Series Best Practices
> **Used by**: `telemetry-database` agent, `dashboardAPI` node
> **Validation**: Verified against InfluxDB official documentation (v1, v2, v3)
## Tag vs. Field Decision Framework
| Criterion | Use Tag | Use Field |
|-----------|---------|-----------|
| Queried in WHERE clause frequently | Yes | No |
| Used in GROUP BY | Yes | No |
| Low cardinality (< 100 distinct values) | Yes | Acceptable |
| High cardinality (IDs, timestamps, free text) | **Never** | Yes |
| Numeric measurement values | No | Yes |
| Needs aggregation (mean, sum, etc.) | No | Yes |
| Node/station/machine identifier | Yes | No |
| Actual sensor reading | No | Yes |
| Setpoint value | No | Yes |
| Quality flag | Depends* | Yes |
*Quality flags: If you have ≤5 quality levels (good/uncertain/bad), a tag is acceptable. If quality is a numeric score, use a field.
## EVOLV Tag/Field Convention
### Standard Tags (low cardinality, indexed)
```
locationId — Site identifier (e.g., "wwtp-brabant-01")
nodeType — Node type (e.g., "rotatingMachine", "reactor")
nodeName — Instance name (e.g., "pump-01", "reactor-A")
machineType — Equipment type (e.g., "pump", "blower", "valve")
stationId — Parent station identifier
measurementType — Sensor type (e.g., "flow", "pressure", "temperature")
```
### Standard Fields (not indexed, high cardinality)
```
value — Primary measurement value
setpoint — Control setpoint
quality — Data quality score (0.0-1.0)
state — Machine state (numeric code)
power — Power consumption (W)
efficiency — Current efficiency (0.0-1.0)
speed — Rotational speed (RPM or fraction)
position — Valve position (0.0-1.0)
```
## Cardinality Management
### What Is Cardinality?
Series cardinality = unique combinations of (measurement_name × tag_key_1 × tag_key_2 × ... × tag_key_n)
### Cardinality Limits
- **InfluxDB v1/v2 (TSM engine)**: High cardinality degrades query performance and increases memory usage. Keep below ~1M series per database.
- **InfluxDB v3**: Supports infinite series cardinality (new storage engine), but keeping cardinality low still improves query speed.
### Anti-Patterns (NEVER do these)
- Encoding timestamps in tag values
- Using UUIDs or session IDs as tags
- Free-text strings as tags
- Unbounded enum values as tags
- One measurement per sensor (use tags to differentiate instead)
### Good Patterns
- Use a single measurement name per data category
- Differentiate by tags, not by measurement name
- Keep tag value sets bounded and predictable
- Document all tag values in a schema registry
## Retention Policies
### Three-Tier Strategy
| Tier | Retention | Resolution | Purpose |
|------|-----------|------------|---------|
| Hot | 7-30 days | Full resolution (1s-10s) | Real-time dashboards, control loops |
| Warm | 90-365 days | Downsampled (1min-5min) | Trending, troubleshooting |
| Cold | 2-10 years | Heavily aggregated (1h-24h) | Compliance reporting, long-term trends |
### EVOLV Recommended Defaults
- Port 1 data at full resolution: 30 days
- 1-minute aggregates: 1 year
- 1-hour aggregates: 5 years (matches regulatory retention requirements)
## Continuous Queries / Tasks (Downsampling)
### InfluxDB v1: Continuous Queries
```sql
CREATE CONTINUOUS QUERY "downsample_1m" ON "evolv"
BEGIN
SELECT mean("value") AS "value", max("value") AS "max", min("value") AS "min"
INTO "rp_warm"."downsampled_1m"
FROM "telemetry"
GROUP BY time(1m), *
END
```
### InfluxDB v2: Tasks
```flux
option task = {name: "downsample_1m", every: 1m}
from(bucket: "telemetry")
|> range(start: -task.every)
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
|> to(bucket: "telemetry-warm")
```
## Query Performance Tips
1. **Always filter by time range first** — time is the primary index
2. **Use tag filters in WHERE** — tags are indexed, fields are not
3. **Avoid regex on tag values** — use exact matches when possible
4. **Limit series scanned** — filter by specific nodeType/nodeName
5. **Use aggregation** — let the database aggregate rather than fetching raw points
6. **Batch writes** — write in batches of 5,000-10,000 points for optimal throughput
## Authoritative References
1. InfluxDB Documentation — "Schema Design and Data Layout" (https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/)
2. InfluxDB Documentation — "Schema Design Recommendations and Best Practices" (v2/v3)
3. InfluxData Blog — "Time Series Data, Cardinality, and InfluxDB"
4. InfluxDB Documentation — "Resolve High Series Cardinality" (https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/)
5. InfluxData (2023). "InfluxDB Best Practices" — Official technical guides