Reproducible Local LLM Stack on a Laptop — Docker, k3d, Ollama, Open WebUI

3/1/2026 by AdminOwl | 3 min read
local-llm DevOps docker k3d ollama open-webui Ansible
All Articles Edit
Reproducible Local LLM Stack on a Laptop — Docker, k3d, Ollama, Open WebUI

Architecture overview

Component Purpose Runtime
Ollama Local model serving / inference API Docker container; exposes API on :11434
Open WebUI Chat frontend Docker container or k3d deployment
k3d Lightweight Kubernetes (k3s in Docker) k3d cluster on host Docker
NVIDIA Container Toolkit GPU passthrough (optional) Docker runtime integration

Prerequisites and verification

Install and verify the core tools on the host before you begin; clone the repo used for automation:

git clone https://github.com/binarobb/local-llm-stack.git
cd local-llm-stack

Verify core tools:

docker --version
kubectl version --client
k3d --version
ansible --version

Step A — GPU enablement (optional)

Install the correct NVIDIA drivers for your OS and confirm the GPU is visible with nvidia-smi. Install the NVIDIA Container Toolkit and configure Docker to use the NVIDIA runtime (example shown for Debian/Ubuntu, abridged). Test GPU passthrough:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

If the test prints your GPU details, Docker can pass the GPU into containers.

Step B — Run Ollama (Docker quickstart)

Pull and run the Ollama image, mounting persistent storage for models and configuration.

docker pull ollama/ollama:latest
docker run -d --name ollama \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --gpus all \
  ollama/ollama:latest

Quick health check::

curl -sS http://localhost:11434/api/health

Persisting Ollama data in a named volume keeps models and configuration safe across container restarts.

Step C — Create a k3d cluster for the UI

Create a small k3d cluster and switch kubeconfig context:

k3d cluster create dev --agents 1 --network k3d-dev --kubeconfig-switch-context
kubectl cluster-info

Using a dedicated k3d network (--network k3d-dev) simplifies routing between host containers and cluster workloads.

Step D — Deploy Open WebUI

Option 1 — Docker (fastest)
Run the Open WebUI container and let it call Ollama on the host:

docker run -d --name open-webui -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:ollama

Option 2 — k3d (Kubernetes)
Apply the repo manifests into the k3d cluster. Inside k3d pods host.docker.internal may not resolve; use host.k3d.internal where available or create a Kubernetes Endpoint/Service that points to the host IP so the UI can reach Ollama.

Step E — Import and run models in Ollama

Use Ollama’s CLI or API to pull or import models (GGUF or supported formats). Example run command inside the Ollama container:

docker exec -it ollama ollama run gemma3

For large models, prefer quantized variants that fit your 12 GB VRAM or run CPU/quantized versions to avoid OOMs.

Automation and idempotency with Ansible

The repository contains Ansible playbooks and roles to make the setup repeatable. Example idempotent task to ensure Ollama runs:

- name: Ensure Ollama container is running
  community.docker.docker_container:
    name: ollama
    image: ollama/ollama:latest
    state: started
    restart_policy: always
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    env:
      OLLAMA_ENV: production

Run the full playbook:

ansible-playbook -i inventory/local.ini site.yml

Preview changes before applying:

ansible-playbook -i inventory/local.ini site.yml --check --diff

...more to come