High-Level Overview
Key Components
- Model manager: downloads and manages GGUF models.
- Inference engine: llama.cpp bindings for text generation.
- Vector store: HNSW + SQLite for semantic search.
- Embedding model: ONNX Runtime with all-MiniLM-L6-v2.
- WASM runtime: resource limits and sandboxing.
- Permission system: manifest-based access control.