Setting Up Ollama with Open WebUI on a VPS Using Docker Compose

Setting Up Ollama with Open WebUI on a VPS Using Docker Compose

We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.

Running a private, ChatGPT-style interface backed by open-source LLMs is one of the most satisfying self-hosting projects you can tackle right now. Ollama handles model management and inference, Open WebUI gives you a polished browser-based chat front-end, and Docker Compose ties them together in a reproducible, easy-to-update stack. In this tutorial I'll walk through exactly how I deploy this combination on a VPS — from picking the right Droplet size to locking down the ports so your model endpoint isn't exposed to the open internet.

Choosing the Right VPS for Ollama

Ollama is CPU-bound when there's no GPU available, so RAM and fast cores matter more than disk IOPS. My rule of thumb: you need roughly 2 GB of RAM per billion parameters, plus headroom for the OS and Open WebUI. For a 7B model like llama3.2 or mistral, that means at least 16 GB RAM — preferably more. For the lighter 3B models you can just about squeeze by on 8 GB, but inference will be noticeably slower.

I use DigitalOcean for this kind of workload. Their General Purpose Droplets with 16 GB RAM run well under $100/month, the UI is clean, and snapshotting the Droplet before big changes has saved me more than once. Their CPU-Optimized range is worth considering if you want faster inference without GPU costs.

DigitalOcean

Whatever VPS you choose, make sure it's running Ubuntu 22.04 or 24.04 LTS. The steps below assume that base image.

Prerequisites

Before writing a single line of Compose YAML, knock out these three things on your fresh VPS:

# Install Docker Engine via the official convenience script
curl -fsSL https://get.docker.com | sh

# Add your user to the docker group so you don't need sudo every time
sudo usermod -aG docker $USER
newgrp docker

# Verify both tools are available
docker --version          # Should show 25.x or higher
docker compose version    # Should show v2.x

# Lock down Ollama's API port before anything starts
sudo ufw allow 22/tcp     # Keep SSH open!
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw deny 11434/tcp
sudo ufw enable
Watch out: Ollama's API port 11434 has no authentication by default. If you expose it publicly, anyone can pull models and run inference at your expense. Always firewall it and access it only from within the Docker network or over a VPN/tunnel.

Writing the Docker Compose File

Create a working directory and drop the Compose file in there. I keep mine at /opt/llm-stack/docker-compose.yml.

mkdir -p /opt/llm-stack
cd /opt/llm-stack
nano docker-compose.yml

Paste in the following. I'll explain the key decisions after the block:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    # Only expose the API on the internal Docker network.
    # Do NOT add "ports:" here unless you know what you're doing.
    networks:
      - llm_net

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      # Point Open WebUI at the Ollama container by service name
      - OLLAMA_BASE_URL=http://ollama:11434
      # Set a strong secret key — change this value
      - WEBUI_SECRET_KEY=change_this_to_a_long_random_string
    volumes:
      - open_webui_data:/app/backend/data
    ports:
      # Expose the web interface only on localhost; put a reverse proxy in front
      - "127.0.0.1:3000:8080"
    networks:
      - llm_net

volumes:
  ollama_data:
  open_webui_data:

networks:
  llm_net:
    driver: bridge

A few deliberate choices here: Ollama has no ports: entry at all, so its API is only reachable from containers on llm_net. Open WebUI binds to 127.0.0.1:3000 — localhost only — so you still need a reverse proxy (Caddy is my preference) or an SSH tunnel to reach it from your browser. The named volumes ollama_data and open_webui_data persist your downloaded models and user accounts across container restarts and image updates.

Starting the Stack and Pulling a Model

Bring everything up with:

cd /opt/llm-stack
docker compose up -d

# Watch the logs to confirm both containers started cleanly
docker compose logs -f --tail=50

Once Ollama is running you need to pull at least one model before Open WebUI has anything to talk to. I almost always start with llama3.2 (3B, fast, good for most tasks) or mistral (7B, better reasoning, slower on CPU-only VPS):

# Pull a 3B model — good starting point on a 8 GB VPS
docker exec -it ollama ollama pull llama3.2

# Or pull a 7B model for better output quality (needs ~14 GB RAM free)
docker exec -it ollama ollama pull mistral

# Verify the model is available
docker exec ollama ollama list

Model files land inside the ollama_data Docker volume, so they survive a docker compose down and back up perfectly with standard volume backup techniques.

Tip: On a CPU-only VPS, first inference after pulling can take 60–90 seconds as Ollama loads the model into memory. Subsequent requests in the same session are much faster because the model stays resident. Don't assume something's broken just because the first response is slow.

Accessing Open WebUI via an SSH Tunnel (Quick Method)

If you just want to test things without setting up a full reverse proxy yet, an SSH tunnel is the fastest path. Run this on your local machine:

# Replace your-vps-ip with your actual server address
ssh -L 3000:127.0.0.1:3000 user@your-vps-ip -N

Then open http://localhost:3000 in your browser. You'll be prompted to create an admin account on first visit. After that, select a model from the dropdown at the top of the chat window and start chatting.

Putting Caddy in Front (Recommended for Permanent Deployments)

I prefer Caddy over Nginx Proxy Manager for this kind of single-domain setup because the Caddyfile is genuinely readable and automatic HTTPS just works. Add a Caddy service to your Compose file, or run it as a separate stack. Here's a minimal Caddyfile that terminates TLS and proxies to Open WebUI:

llm.yourdomain.com {
    reverse_proxy open-webui:8080
    encode gzip
}

Point Caddy at the same llm_net Docker network so it can reach the open-webui container by name, open ports 80 and 443 in UFW, and you'll have a fully TLS-wrapped LLM chat interface at your custom domain.

Keeping the Stack Updated

Both Ollama and Open WebUI ship updates frequently — sometimes weekly. Because we're using :latest and :main tags, updating is straightforward:

cd /opt/llm-stack

# Pull new images
docker compose pull

# Recreate containers with the new images (zero data loss — volumes persist)
docker compose up -d --remove-orphans

# Clean up old image layers to reclaim disk space
docker image prune -f

If you want this to happen automatically, Watchtower can monitor the containers and pull updates on a schedule. I'd recommend setting Watchtower to notify-only mode for production and letting it auto-update on a dev/test VPS — breaking changes do happen in Open WebUI's :main tag occasionally.

Conclusion

With about 20 minutes of work you now have a private, self-hosted AI chat interface running on your own infrastructure — no API keys, no usage limits, no data leaving your server. The Docker Compose approach makes the whole stack easy to back up, move to a bigger Droplet when you need more RAM, and update without downtime.

From here I'd suggest two next steps: first, lock down Open WebUI with a proper reverse proxy and domain name so you can access it from anywhere without an SSH tunnel; second, explore Open WebUI's model management UI to try code-focused models like qwen2.5-coder or the deepseek-r1 series — both pull and run exactly the same way via ollama pull.

Get dependable uptime with our 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.

Discussion