Reproducible Local LLM Stack on a Laptop — Docker, k3d, Ollama, Open WebUI
local-llm DevOps docker k3d ollama open-webui Ansible
Architecture overview
| Component | Purpose | Runtime |
|---|---|---|
| Ollama | Local model serving / inference API | Docker container; exposes API on :11434 |
| Open WebUI | Chat frontend | Docker container or k3d deployment |
| k3d | Lightweight Kubernetes (k3s in Docker) | k3d cluster on host Docker |
| NVIDIA Container Toolkit | GPU passthrough (optional) | Docker runtime integration |
Prerequisites and verification
Install and verify the core tools on the host before you begin; clone the repo used for automation:
git clone https://github.com/binarobb/local-llm-stack.git
cd local-llm-stack
Verify core tools:
docker --version
kubectl version --client
k3d --version
ansible --version
Step A — GPU enablement (optional)
Install the correct NVIDIA drivers for your OS and confirm the GPU is visible with nvidia-smi. Install the NVIDIA Container Toolkit and configure Docker to use the NVIDIA runtime (example shown for Debian/Ubuntu, abridged). Test GPU passthrough:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
If the test prints your GPU details, Docker can pass the GPU into containers.
Step B — Run Ollama (Docker quickstart)
Pull and run the Ollama image, mounting persistent storage for models and configuration.
docker pull ollama/ollama:latest
docker run -d --name ollama \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
--gpus all \
ollama/ollama:latest
Quick health check::
curl -sS http://localhost:11434/api/health
Persisting Ollama data in a named volume keeps models and configuration safe across container restarts.
Step C — Create a k3d cluster for the UI
Create a small k3d cluster and switch kubeconfig context:
k3d cluster create dev --agents 1 --network k3d-dev --kubeconfig-switch-context
kubectl cluster-info
Using a dedicated k3d network (--network k3d-dev) simplifies routing between host containers and cluster workloads.
Step D — Deploy Open WebUI
Option 1 — Docker (fastest)
Run the Open WebUI container and let it call Ollama on the host:
docker run -d --name open-webui -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:ollama
Option 2 — k3d (Kubernetes)
Apply the repo manifests into the k3d cluster. Inside k3d pods host.docker.internal may not resolve; use host.k3d.internal where available or create a Kubernetes Endpoint/Service that points to the host IP so the UI can reach Ollama.
Step E — Import and run models in Ollama
Use Ollama’s CLI or API to pull or import models (GGUF or supported formats). Example run command inside the Ollama container:
docker exec -it ollama ollama run gemma3
For large models, prefer quantized variants that fit your 12 GB VRAM or run CPU/quantized versions to avoid OOMs.
Automation and idempotency with Ansible
The repository contains Ansible playbooks and roles to make the setup repeatable. Example idempotent task to ensure Ollama runs:
- name: Ensure Ollama container is running
community.docker.docker_container:
name: ollama
image: ollama/ollama:latest
state: started
restart_policy: always
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
env:
OLLAMA_ENV: production
Run the full playbook:
ansible-playbook -i inventory/local.ini site.yml
Preview changes before applying:
ansible-playbook -i inventory/local.ini site.yml --check --diff
...more to come