Migrate to new Gitea instance (gitea.wbd-rd.nl)
- Update all submodule URLs from gitea.centraal.wbd-rd.nl to gitea.wbd-rd.nl - Add settler as proper submodule in .gitmodules - Add agent skills, function anchors, decisions, and improvements - Add Docker configuration and scripts - Add manuals and third_party docs - Update .gitignore with secrets and build artifacts - Remove stale .tgz build artifact Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
122
third_party/docs/influxdb-schema-design.md
vendored
Normal file
122
third_party/docs/influxdb-schema-design.md
vendored
Normal file
@@ -0,0 +1,122 @@
|
||||
# InfluxDB Time-Series Best Practices
|
||||
|
||||
> **Used by**: `telemetry-database` agent, `dashboardAPI` node
|
||||
> **Validation**: Verified against InfluxDB official documentation (v1, v2, v3)
|
||||
|
||||
## Tag vs. Field Decision Framework
|
||||
|
||||
| Criterion | Use Tag | Use Field |
|
||||
|-----------|---------|-----------|
|
||||
| Queried in WHERE clause frequently | Yes | No |
|
||||
| Used in GROUP BY | Yes | No |
|
||||
| Low cardinality (< 100 distinct values) | Yes | Acceptable |
|
||||
| High cardinality (IDs, timestamps, free text) | **Never** | Yes |
|
||||
| Numeric measurement values | No | Yes |
|
||||
| Needs aggregation (mean, sum, etc.) | No | Yes |
|
||||
| Node/station/machine identifier | Yes | No |
|
||||
| Actual sensor reading | No | Yes |
|
||||
| Setpoint value | No | Yes |
|
||||
| Quality flag | Depends* | Yes |
|
||||
|
||||
*Quality flags: If you have ≤5 quality levels (good/uncertain/bad), a tag is acceptable. If quality is a numeric score, use a field.
|
||||
|
||||
## EVOLV Tag/Field Convention
|
||||
|
||||
### Standard Tags (low cardinality, indexed)
|
||||
```
|
||||
locationId — Site identifier (e.g., "wwtp-brabant-01")
|
||||
nodeType — Node type (e.g., "rotatingMachine", "reactor")
|
||||
nodeName — Instance name (e.g., "pump-01", "reactor-A")
|
||||
machineType — Equipment type (e.g., "pump", "blower", "valve")
|
||||
stationId — Parent station identifier
|
||||
measurementType — Sensor type (e.g., "flow", "pressure", "temperature")
|
||||
```
|
||||
|
||||
### Standard Fields (not indexed, high cardinality)
|
||||
```
|
||||
value — Primary measurement value
|
||||
setpoint — Control setpoint
|
||||
quality — Data quality score (0.0-1.0)
|
||||
state — Machine state (numeric code)
|
||||
power — Power consumption (W)
|
||||
efficiency — Current efficiency (0.0-1.0)
|
||||
speed — Rotational speed (RPM or fraction)
|
||||
position — Valve position (0.0-1.0)
|
||||
```
|
||||
|
||||
## Cardinality Management
|
||||
|
||||
### What Is Cardinality?
|
||||
Series cardinality = unique combinations of (measurement_name × tag_key_1 × tag_key_2 × ... × tag_key_n)
|
||||
|
||||
### Cardinality Limits
|
||||
- **InfluxDB v1/v2 (TSM engine)**: High cardinality degrades query performance and increases memory usage. Keep below ~1M series per database.
|
||||
- **InfluxDB v3**: Supports infinite series cardinality (new storage engine), but keeping cardinality low still improves query speed.
|
||||
|
||||
### Anti-Patterns (NEVER do these)
|
||||
- Encoding timestamps in tag values
|
||||
- Using UUIDs or session IDs as tags
|
||||
- Free-text strings as tags
|
||||
- Unbounded enum values as tags
|
||||
- One measurement per sensor (use tags to differentiate instead)
|
||||
|
||||
### Good Patterns
|
||||
- Use a single measurement name per data category
|
||||
- Differentiate by tags, not by measurement name
|
||||
- Keep tag value sets bounded and predictable
|
||||
- Document all tag values in a schema registry
|
||||
|
||||
## Retention Policies
|
||||
|
||||
### Three-Tier Strategy
|
||||
|
||||
| Tier | Retention | Resolution | Purpose |
|
||||
|------|-----------|------------|---------|
|
||||
| Hot | 7-30 days | Full resolution (1s-10s) | Real-time dashboards, control loops |
|
||||
| Warm | 90-365 days | Downsampled (1min-5min) | Trending, troubleshooting |
|
||||
| Cold | 2-10 years | Heavily aggregated (1h-24h) | Compliance reporting, long-term trends |
|
||||
|
||||
### EVOLV Recommended Defaults
|
||||
- Port 1 data at full resolution: 30 days
|
||||
- 1-minute aggregates: 1 year
|
||||
- 1-hour aggregates: 5 years (matches regulatory retention requirements)
|
||||
|
||||
## Continuous Queries / Tasks (Downsampling)
|
||||
|
||||
### InfluxDB v1: Continuous Queries
|
||||
```sql
|
||||
CREATE CONTINUOUS QUERY "downsample_1m" ON "evolv"
|
||||
BEGIN
|
||||
SELECT mean("value") AS "value", max("value") AS "max", min("value") AS "min"
|
||||
INTO "rp_warm"."downsampled_1m"
|
||||
FROM "telemetry"
|
||||
GROUP BY time(1m), *
|
||||
END
|
||||
```
|
||||
|
||||
### InfluxDB v2: Tasks
|
||||
```flux
|
||||
option task = {name: "downsample_1m", every: 1m}
|
||||
|
||||
from(bucket: "telemetry")
|
||||
|> range(start: -task.every)
|
||||
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
|
||||
|> to(bucket: "telemetry-warm")
|
||||
```
|
||||
|
||||
## Query Performance Tips
|
||||
|
||||
1. **Always filter by time range first** — time is the primary index
|
||||
2. **Use tag filters in WHERE** — tags are indexed, fields are not
|
||||
3. **Avoid regex on tag values** — use exact matches when possible
|
||||
4. **Limit series scanned** — filter by specific nodeType/nodeName
|
||||
5. **Use aggregation** — let the database aggregate rather than fetching raw points
|
||||
6. **Batch writes** — write in batches of 5,000-10,000 points for optimal throughput
|
||||
|
||||
## Authoritative References
|
||||
|
||||
1. InfluxDB Documentation — "Schema Design and Data Layout" (https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/)
|
||||
2. InfluxDB Documentation — "Schema Design Recommendations and Best Practices" (v2/v3)
|
||||
3. InfluxData Blog — "Time Series Data, Cardinality, and InfluxDB"
|
||||
4. InfluxDB Documentation — "Resolve High Series Cardinality" (https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/)
|
||||
5. InfluxData (2023). "InfluxDB Best Practices" — Official technical guides
|
||||
Reference in New Issue
Block a user