Add document converter, seeder data structure, and project wiki
- ai-service/convert.py: converts Office/PDF files to markdown with frontmatter - database/seeders/data/: folder structure for themas, projects, documents, etc. - database/seeders/data/raw/: drop zone for Office/PDF files to convert - wiki/: project architecture, concepts, and knowledge graph documentation - Remove unused Laravel example tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
126
database/seeders/data/README.md
Normal file
126
database/seeders/data/README.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Seeder Data
|
||||
|
||||
Drop example documents here. The seeder reads these files and loads them into PostgreSQL, which then triggers embedding generation via the AI service.
|
||||
|
||||
## Option 1: Drop raw Office/PDF files
|
||||
|
||||
Put `.docx`, `.pptx`, `.xlsx`, `.pdf` files in the `raw/` folder, then run the converter:
|
||||
|
||||
```bash
|
||||
cd ai-service
|
||||
pip install python-docx python-pptx openpyxl pymupdf chardet
|
||||
python convert.py # convert all files in raw/
|
||||
python convert.py --out kennis_artikelen # output to a specific subfolder
|
||||
python convert.py path/to/file.docx # convert a single file
|
||||
```
|
||||
|
||||
This extracts text, preserves structure (headings, tables, slides), and writes markdown files with frontmatter into the `documents/` folder (or whichever `--out` you specify).
|
||||
|
||||
## Option 2: Write markdown directly
|
||||
|
||||
Each file is **Markdown with YAML frontmatter**. The seeder parses the frontmatter for metadata and the body as content.
|
||||
|
||||
## Folder Structure
|
||||
|
||||
```
|
||||
data/
|
||||
raw/ ← DROP OFFICE/PDF FILES HERE (auto-converted)
|
||||
themas/ ← Strategic themes
|
||||
projects/ ← Project descriptions (linked to a thema)
|
||||
documents/ ← Meeting notes, specs, analyses (linked to a project)
|
||||
kennis_artikelen/ ← Knowledge articles (standalone)
|
||||
besluiten/ ← Decisions with rationale (linked to a project)
|
||||
lessons_learned/ ← Lessons learned (linked to a project + fase)
|
||||
```
|
||||
|
||||
## File Templates
|
||||
|
||||
### themas/waterkwaliteit.md
|
||||
```markdown
|
||||
---
|
||||
naam: Waterkwaliteit & Monitoring
|
||||
beschrijving: Innovaties rondom waterkwaliteitsbewaking en meetnetwerken
|
||||
prioriteit: hoog
|
||||
---
|
||||
```
|
||||
|
||||
### projects/sensor-netwerk.md
|
||||
```markdown
|
||||
---
|
||||
naam: Sensor Netwerk Pilot
|
||||
thema: Waterkwaliteit & Monitoring
|
||||
eigenaar: Jan de Vries
|
||||
status: experiment
|
||||
prioriteit: hoog
|
||||
startdatum: 2025-09-01
|
||||
streef_einddatum: 2026-06-30
|
||||
---
|
||||
|
||||
Beschrijving van het project hier. Kan meerdere alinea's zijn.
|
||||
Dit wordt het `beschrijving` veld.
|
||||
```
|
||||
|
||||
### documents/notulen-stuurgroep-2026-03.md
|
||||
```markdown
|
||||
---
|
||||
titel: Notulen Stuurgroep 15 maart 2026
|
||||
project: Sensor Netwerk Pilot
|
||||
type: vergaderverslag
|
||||
auteur: Pieter Jansen
|
||||
versie: 1
|
||||
---
|
||||
|
||||
De volledige documentinhoud hier. Kan lang zijn — wordt automatisch
|
||||
gechunkt als het meer dan 1500 tekens is.
|
||||
```
|
||||
|
||||
### kennis_artikelen/iot-waterbeheer.md
|
||||
```markdown
|
||||
---
|
||||
titel: IoT-sensoren in het waterbeheer
|
||||
auteur: Lisa de Groot
|
||||
tags: [IoT, sensoren, waterbeheer, telemetrie]
|
||||
---
|
||||
|
||||
Artikelinhoud hier. Alles na de frontmatter wordt `inhoud`.
|
||||
```
|
||||
|
||||
### besluiten/go-pilot-sensor.md
|
||||
```markdown
|
||||
---
|
||||
titel: Go voor pilotfase Sensor Netwerk
|
||||
project: Sensor Netwerk Pilot
|
||||
type: go_no_go
|
||||
status: goedgekeurd
|
||||
datum: 2026-02-20
|
||||
---
|
||||
|
||||
Beschrijving van het besluit.
|
||||
|
||||
## Onderbouwing
|
||||
|
||||
De onderbouwing / rationale hier.
|
||||
```
|
||||
|
||||
### lessons_learned/sensor-kalibratie.md
|
||||
```markdown
|
||||
---
|
||||
titel: Kalibratiefrequentie pH-sensoren onderschat
|
||||
project: Sensor Netwerk Pilot
|
||||
fase: experiment
|
||||
tags: [sensoren, kalibratie, pH]
|
||||
---
|
||||
|
||||
Wat we geleerd hebben. Body wordt `inhoud`.
|
||||
```
|
||||
|
||||
## Naming Convention
|
||||
|
||||
Use lowercase slugs: `korte-beschrijving.md`. The filename is not stored — all metadata comes from frontmatter.
|
||||
|
||||
## Tips
|
||||
|
||||
- Write in Dutch (the embedding model and full-text search are configured for Dutch)
|
||||
- Real content is better than lorem ipsum — the search quality depends on it
|
||||
- Longer documents are fine — they get chunked automatically
|
||||
- Link projects to themas and documents to projects via the `thema:` and `project:` fields (matched by naam)
|
||||
Reference in New Issue
Block a user