- ai-service/convert.py: converts Office/PDF files to markdown with frontmatter - database/seeders/data/: folder structure for themas, projects, documents, etc. - database/seeders/data/raw/: drop zone for Office/PDF files to convert - wiki/: project architecture, concepts, and knowledge graph documentation - Remove unused Laravel example tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
127 lines
3.4 KiB
Markdown
127 lines
3.4 KiB
Markdown
# Seeder Data
|
|
|
|
Drop example documents here. The seeder reads these files and loads them into PostgreSQL, which then triggers embedding generation via the AI service.
|
|
|
|
## Option 1: Drop raw Office/PDF files
|
|
|
|
Put `.docx`, `.pptx`, `.xlsx`, `.pdf` files in the `raw/` folder, then run the converter:
|
|
|
|
```bash
|
|
cd ai-service
|
|
pip install python-docx python-pptx openpyxl pymupdf chardet
|
|
python convert.py # convert all files in raw/
|
|
python convert.py --out kennis_artikelen # output to a specific subfolder
|
|
python convert.py path/to/file.docx # convert a single file
|
|
```
|
|
|
|
This extracts text, preserves structure (headings, tables, slides), and writes markdown files with frontmatter into the `documents/` folder (or whichever `--out` you specify).
|
|
|
|
## Option 2: Write markdown directly
|
|
|
|
Each file is **Markdown with YAML frontmatter**. The seeder parses the frontmatter for metadata and the body as content.
|
|
|
|
## Folder Structure
|
|
|
|
```
|
|
data/
|
|
raw/ ← DROP OFFICE/PDF FILES HERE (auto-converted)
|
|
themas/ ← Strategic themes
|
|
projects/ ← Project descriptions (linked to a thema)
|
|
documents/ ← Meeting notes, specs, analyses (linked to a project)
|
|
kennis_artikelen/ ← Knowledge articles (standalone)
|
|
besluiten/ ← Decisions with rationale (linked to a project)
|
|
lessons_learned/ ← Lessons learned (linked to a project + fase)
|
|
```
|
|
|
|
## File Templates
|
|
|
|
### themas/waterkwaliteit.md
|
|
```markdown
|
|
---
|
|
naam: Waterkwaliteit & Monitoring
|
|
beschrijving: Innovaties rondom waterkwaliteitsbewaking en meetnetwerken
|
|
prioriteit: hoog
|
|
---
|
|
```
|
|
|
|
### projects/sensor-netwerk.md
|
|
```markdown
|
|
---
|
|
naam: Sensor Netwerk Pilot
|
|
thema: Waterkwaliteit & Monitoring
|
|
eigenaar: Jan de Vries
|
|
status: experiment
|
|
prioriteit: hoog
|
|
startdatum: 2025-09-01
|
|
streef_einddatum: 2026-06-30
|
|
---
|
|
|
|
Beschrijving van het project hier. Kan meerdere alinea's zijn.
|
|
Dit wordt het `beschrijving` veld.
|
|
```
|
|
|
|
### documents/notulen-stuurgroep-2026-03.md
|
|
```markdown
|
|
---
|
|
titel: Notulen Stuurgroep 15 maart 2026
|
|
project: Sensor Netwerk Pilot
|
|
type: vergaderverslag
|
|
auteur: Pieter Jansen
|
|
versie: 1
|
|
---
|
|
|
|
De volledige documentinhoud hier. Kan lang zijn — wordt automatisch
|
|
gechunkt als het meer dan 1500 tekens is.
|
|
```
|
|
|
|
### kennis_artikelen/iot-waterbeheer.md
|
|
```markdown
|
|
---
|
|
titel: IoT-sensoren in het waterbeheer
|
|
auteur: Lisa de Groot
|
|
tags: [IoT, sensoren, waterbeheer, telemetrie]
|
|
---
|
|
|
|
Artikelinhoud hier. Alles na de frontmatter wordt `inhoud`.
|
|
```
|
|
|
|
### besluiten/go-pilot-sensor.md
|
|
```markdown
|
|
---
|
|
titel: Go voor pilotfase Sensor Netwerk
|
|
project: Sensor Netwerk Pilot
|
|
type: go_no_go
|
|
status: goedgekeurd
|
|
datum: 2026-02-20
|
|
---
|
|
|
|
Beschrijving van het besluit.
|
|
|
|
## Onderbouwing
|
|
|
|
De onderbouwing / rationale hier.
|
|
```
|
|
|
|
### lessons_learned/sensor-kalibratie.md
|
|
```markdown
|
|
---
|
|
titel: Kalibratiefrequentie pH-sensoren onderschat
|
|
project: Sensor Netwerk Pilot
|
|
fase: experiment
|
|
tags: [sensoren, kalibratie, pH]
|
|
---
|
|
|
|
Wat we geleerd hebben. Body wordt `inhoud`.
|
|
```
|
|
|
|
## Naming Convention
|
|
|
|
Use lowercase slugs: `korte-beschrijving.md`. The filename is not stored — all metadata comes from frontmatter.
|
|
|
|
## Tips
|
|
|
|
- Write in Dutch (the embedding model and full-text search are configured for Dutch)
|
|
- Real content is better than lorem ipsum — the search quality depends on it
|
|
- Longer documents are fine — they get chunked automatically
|
|
- Link projects to themas and documents to projects via the `thema:` and `project:` fields (matched by naam)
|