Add document converter, seeder data structure, and project wiki

- ai-service/convert.py: converts Office/PDF files to markdown with frontmatter - database/seeders/data/: folder structure for themas, projects, documents, etc. - database/seeders/data/raw/: drop zone for Office/PDF files to convert - wiki/: project architecture, concepts, and knowledge graph documentation - Remove unused Laravel example tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:33:30 +02:00
parent 302c790c13
commit 926872a082
23 changed files with 1785 additions and 76 deletions
--- a/wiki/concepts/ai-integration.md
+++ b/wiki/concepts/ai-integration.md
@@ -0,0 +1,80 @@
+---
+title: AI Integration
+created: 2026-04-08
+updated: 2026-04-08
+status: speculative
+tags: [concept, ai, rag, langgraph, embeddings]
+sources: [ai-service/app/main.py, ai-service/requirements.txt, docker-compose.yml]
+---
+
+# AI Integration
+
+## Current State
+
+The AI service is a **Python FastAPI stub** with placeholder endpoints. No actual AI processing is wired up yet.
+
+### Implemented (stub only)
+
+| Endpoint | Method | Status |
+|---|---|---|
+| `GET /health` | Health check | Working |
+| `POST /api/chat` | Chat with context | Stub — returns placeholder text |
+| `POST /api/summarize` | Generate summaries | Stub — returns placeholder text |
+| `POST /api/search` | Semantic search | Stub — returns empty results |
+
+### Request/Response Models (Pydantic)
+
+```
+ChatRequest:     message, project_id?, conversation_history[]
+ChatResponse:    reply, project_id?
+SummarizeRequest: content, project_id?, summary_type?
+SummarizeResponse: summary, project_id?
+SearchRequest:   query, project_id?, limit?
+SearchResponse:  results[{id, content, score, metadata}], query
+```
+
+## Planned Architecture
+
+```
+Laravel App ↔ HTTP ↔ Python AI-Service (FastAPI)
+                        ├── LangGraph Orchestrator
+                        │     ├── Router / Classifier
+                        │     └── Agent graph (state machine)
+                        ├── Anthropic Claude (LLM)
+                        ├── pgvector (embeddings / similarity search)
+                        └── Tools:
+                              ├── DB query (project data, commitments, phases)
+                              ├── Document retrieval (semantic search)
+                              └── Embedding generation
+```
+
+## RAG Pipeline (planned)
+
+### Sources
+- Project descriptions and phase notes
+- Documents (uploaded files, meeting notes)
+- Lessons learned
+- Decisions and their rationale
+- Knowledge articles
+
+### Embedding Strategy
+- **Storage**: pgvector extension on PostgreSQL 16
+- **Models**: Document and KennisArtikel already have `embedding` vector columns
+- **Update triggers**: On document create/update, on project phase change
+- **Chunking**: Per document type and size
+
+### Agent Skills (from CLAUDE.md)
+| Agent | Autonomy | Purpose |
+|---|---|---|
+| Project Assistant | Low | Answer questions about specific projects |
+| Knowledge Assistant | Low | Search and surface knowledge articles |
+| Document Assistant | Medium | Summarize, compare, extract from documents |
+| System Tasks | High | Background indexing, embedding updates |
+
+## Content Governance Rules
+
+1. AI-generated content always labeled ("AI-suggestie", "Concept")
+2. Human confirmation required before AI content gains system status
+3. All AI interactions logged (request, response, tools used, sources cited)
+4. Source attribution mandatory in AI responses
+5. Confidence indicators when certainty is low