How to Set Up Ollama with a Web UI on Your VPS Using Docker Compose

How to Set Up Ollama with a Web UI on Your VPS Using Docker Compose

We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.

Running your own large language model on a VPS is one of those things that sounds complicated but is actually achievable in an afternoon — once you know which pieces fit together. In this tutorial I'll show you exactly how I deploy Ollama alongside Open WebUI using Docker Compose, giving you a private, ChatGPT-style interface that talks only to your own server. No API bills, no data leaving your machine, no rate limits.

What You Actually Need Before You Start

Let me be straight about hardware: Ollama on a CPU-only VPS works, but it's slow. For comfortable inference with a 7B-parameter model like llama3 or mistral, you want at least 8 GB of RAM and 4 vCPUs. For a 13B model, double that. I run this setup on a DigitalOcean Droplet with 16 GB RAM and 8 vCPUs, which handles Mistral 7B at a very usable speed — roughly 10–15 tokens per second. If you need a VPS to spin this up on, DigitalOcean makes it easy to build and deploy apps from code to production in just a few clicks — their General Purpose Droplets are a solid fit for this workload.

Software requirements on your VPS:

If you haven't installed Docker yet, the official convenience script is the fastest path:

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker

Understanding the Architecture

There are two containers doing the work here. Ollama is the backend — it's an inference server that exposes a REST API on port 11434. It manages model downloads, quantization, and GPU/CPU scheduling. Open WebUI (formerly Ollama WebUI) is the frontend — a polished React application that connects to Ollama's API and gives you a chat interface that's honestly better than many commercial products I've used.

The two containers talk to each other over a private Docker network. Open WebUI is the only thing that needs to be reachable from the outside world, and even then I strongly recommend putting it behind a reverse proxy with authentication rather than exposing it raw. I'll cover that at the end.

The Docker Compose File

Create a working directory and drop in this compose.yml:

mkdir -p ~/ollama-stack && cd ~/ollama-stack
nano compose.yml

Paste the following:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - ollama_net
    # Expose only to internal network — do NOT publish 11434 externally
    # If you need GPU passthrough, add the deploy block below and
    # uncomment the runtime line:
    # runtime: nvidia
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=change-this-to-a-long-random-string
    volumes:
      - open_webui_data:/app/backend/data
    networks:
      - ollama_net
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

networks:
  ollama_net:
    driver: bridge
Watch out: Change WEBUI_SECRET_KEY to a genuine random string before you run this. Open WebUI uses it to sign session tokens. You can generate one with openssl rand -hex 32. Also, do not publish Ollama's port 11434 to the internet — there is no built-in authentication on the Ollama API.

With the file saved, bring everything up:

docker compose up -d
docker compose logs -f

The first boot takes 30–60 seconds while Open WebUI initialises its SQLite database. Once you see Application startup complete in the logs, you can hit http://your-server-ip:3000 in a browser. You'll be prompted to create an admin account — the first account registered becomes the admin automatically.

Pulling Your First Model

Open WebUI has a model download UI, but I prefer doing it from the CLI because it's faster to script and easier to automate later. Exec into the Ollama container:

# Pull Mistral 7B (good balance of speed and quality on CPU)
docker exec -it ollama ollama pull mistral

# Or pull Llama 3.1 8B if you want Meta's latest
docker exec -it ollama ollama pull llama3.1:8b

# List what you have installed
docker exec -it ollama ollama list

The models are stored in the ollama_data Docker volume, so they survive container restarts and image upgrades without re-downloading. Mistral 7B is about 4.1 GB compressed; Llama 3.1 8B is roughly 4.7 GB. Make sure your VPS has enough disk space before you start pulling — I recommend at least 20 GB free per model you intend to keep.

Tip: Use quantized models to save RAM. The default pulls are Q4_K_M quantization which is a great trade-off. If you're tight on memory, try ollama pull mistral:7b-instruct-q3_K_S for a smaller footprint — it still produces perfectly usable output for most tasks.

Putting Caddy in Front for HTTPS

Exposing port 3000 over plain HTTP is fine for testing but not for real use. I prefer Caddy for this because it handles Let's Encrypt certificates automatically. Add it to the same compose file or create a separate one in the same directory. Here's the Caddyfile approach — create Caddyfile in your project folder:

ai.yourdomain.com {
    reverse_proxy open-webui:8080
    encode gzip
}

Then update your compose.yml to add Caddy and remove the direct port mapping on Open WebUI:

  caddy:
    image: caddy:2-alpine
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy_data:/data
      - caddy_config:/config
    networks:
      - ollama_net
    depends_on:
      - open-webui

Add caddy_data and caddy_config to your volumes block, remove the ports section from open-webui, and run docker compose up -d again. Caddy will obtain a certificate from Let's Encrypt on first request. Your Open WebUI is now at https://ai.yourdomain.com with a valid cert — no certbot, no cron jobs, no manual renewal.

Keeping Everything Updated

Both Ollama and Open WebUI ship updates frequently — sometimes daily. The simplest way to stay current is to pull new images and recreate the containers:

cd ~/ollama-stack
docker compose pull
docker compose up -d --remove-orphans
# Prune old images to reclaim disk space
docker image prune -f

I run this as a weekly cron job. If you want fully automated updates, Watchtower is another option — but be aware that auto-updating stateful services like Open WebUI carries some risk if a release has a breaking migration. For a personal setup I lean toward manual but scheduled updates.

Firewall Rules to Lock It Down

Before you call this production-ready, make sure UFW (or your VPS firewall) is blocking direct access to port 3000. With Caddy in place, you only need 80 and 443 open to the world, plus 22 for SSH:

sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 443/udp   # HTTP/3 QUIC — Caddy uses this
sudo ufw enable
sudo ufw status verbose

Port 11434 (Ollama's API) should never be in your firewall rules for external access. It has no authentication layer whatsoever, and exposing it publicly means anyone can download models, run inference, and hammer your RAM at will.

What to Try First

Once you're in Open WebUI, a few things worth exploring right away: the system prompt editor lets you give your model a persona or set of instructions that persist across conversations. The model switcher at the top of the chat lets you compare responses between Mistral and Llama side-by-side. And the RAG (Retrieval-Augmented Generation) feature lets you upload PDFs and documents and ask questions about them — entirely locally, which is the part that makes this genuinely useful for real work rather than just a demo.

For next steps, I'd recommend looking at putting Authelia in front of Open WebUI if you share your server with others — it adds proper SSO and MFA without changing anything inside the app. You might also look into model-specific Modelfiles in Ollama if you want to bake in custom system prompts or adjust parameters like temperature at the model level rather than the chat level. Start spending more time experimenting with your AI stack and less time worrying about infrastructure — create your DigitalOcean account today if you need a capable VPS to run all of this on.

Discussion

```