Setting Up Ollama with Open WebUI on a VPS for a Private ChatGPT Alternative

CompactHost · June 11, 2026

We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.

Running your own private ChatGPT alternative is one of the most satisfying self-hosting projects you can take on right now. When I first switched from paying for cloud AI APIs to running Llama 3 on my own VPS, the difference in privacy confidence was immediate — every token generated stays on my server, never touches a third-party logging pipeline. In this tutorial I'll show you exactly how to get Ollama and Open WebUI running together on a VPS, fronted by Caddy for automatic HTTPS, with a Docker Compose setup you can have live in under an hour.

What You'll Need

Before we start, let me be honest about hardware requirements. Ollama on a VPS is perfectly viable for CPU-only inference, but you need enough RAM to load a model. For a 7B parameter model like llama3.2 or mistral, you want at least 8 GB of RAM — 16 GB is more comfortable. I run this setup on a Hetzner CX32 (4 vCPU, 8 GB RAM) and it handles llama3.2:3b well for light daily use. For 7B models I'd step up to the CX42 (16 GB RAM) without hesitation. Contabo's VPS S (8 GB RAM) is a cheaper option if you're on a tight budget, though I find their network speeds less consistent.

You'll also need:

A VPS running Ubuntu 24.04 LTS
Docker and Docker Compose v2 installed
A domain name pointed at your server's IP (I'll use ai.example.com throughout)
Ports 80 and 443 open in your firewall

Tip: If you don't have a domain yet, Cloudflare offers free DNS management. Point an A record at your VPS IP and let Caddy handle the rest — it will automatically provision a Let's Encrypt certificate with zero extra configuration on your part.

Project Directory Structure

I keep everything under /opt/ollama-stack. Create the directory and drop in a single docker-compose.yml and a Caddyfile. That's genuinely all the configuration files you need. Run these commands as root or with sudo:

mkdir -p /opt/ollama-stack
cd /opt/ollama-stack

# Create directories for persistent model storage and Caddy data
mkdir -p ollama-data caddy-data caddy-config

The Docker Compose File

Here's the complete docker-compose.yml I use in production. Open WebUI connects to Ollama via the internal Docker network — Ollama itself never needs to be exposed to the internet, which is exactly how I prefer it. Caddy is the only thing that faces the public.

cat > /opt/ollama-stack/docker-compose.yml << 'EOF'
services:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ./ollama-data:/root/.ollama
    networks:
      - ai-net
    # Uncomment the lines below if your VPS has an NVIDIA GPU
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=change-this-to-a-long-random-string
    volumes:
      - open-webui-data:/app/backend/data
    networks:
      - ai-net

  caddy:
    image: caddy:2-alpine
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - ./caddy-data:/data
      - ./caddy-config:/config
    networks:
      - ai-net

volumes:
  open-webui-data:

networks:
  ai-net:
    driver: bridge
EOF

Watch out: The WEBUI_SECRET_KEY environment variable is used to sign session tokens. Do not leave it as the placeholder above — generate a proper random string with openssl rand -hex 32 and paste it in before you start the stack. If you ever change it later, all existing sessions will be invalidated.

Configuring Caddy as a Reverse Proxy

I prefer Caddy over Nginx Proxy Manager for setups like this because the configuration is genuinely minimal and automatic HTTPS just works without any clicking around in a GUI. Here's the Caddyfile — replace ai.example.com with your actual domain:

cat > /opt/ollama-stack/Caddyfile << 'EOF'
ai.example.com {
    reverse_proxy open-webui:8080

    # Optional: restrict access to your own IP ranges during initial setup
    # @blocked not remote_ip 1.2.3.4/32
    # respond @blocked "Forbidden" 403

    encode gzip
    log {
        output file /data/logs/access.log
        format json
    }
}
EOF

Caddy will automatically obtain and renew a TLS certificate from Let's Encrypt as soon as it starts, as long as your DNS is correctly pointed and port 80/443 are reachable. No certbot cron jobs, no manual renewals — it genuinely just handles it.

Starting the Stack and Pulling a Model

Bring everything up with Docker Compose, then pull your first model:

cd /opt/ollama-stack

# Start all services
docker compose up -d

# Watch the logs to confirm everything started cleanly
docker compose logs -f --tail=50

# Pull a model — llama3.2:3b is fast and fits comfortably in 8 GB RAM
docker exec ollama ollama pull llama3.2:3b

# Or for something more capable on a 16 GB server:
# docker exec ollama ollama pull mistral:7b

# Check what models are available
docker exec ollama ollama list

The model pull will take a few minutes depending on your VPS bandwidth — llama3.2:3b is about 2 GB, while mistral:7b is closer to 4.1 GB. Once it's done, navigate to https://ai.example.com in your browser. On first visit Open WebUI will ask you to create an admin account — the first registered user automatically becomes the administrator.

Firewall Hardening

You do not want Ollama's port 11434 exposed to the internet. Since we're running everything inside the Docker network, it isn't exposed by default — but let's make sure UFW is configured correctly regardless:

# Allow SSH, HTTP, and HTTPS only
ufw allow 22/tcp
ufw allow 80/tcp
ufw allow 443/tcp

# Make sure 11434 is NOT open (it shouldn't be, but verify)
ufw deny 11434/tcp

ufw enable
ufw status verbose

Docker's iptables rules bypass UFW for container-to-container traffic, which is fine — that's the internal network we want. The key point is that nothing should be mapping port 11434 to the host. Double-check your docker-compose.yml has no ports: directive under the ollama service — mine doesn't, and neither should yours.

Updating the Stack

Open WebUI ships updates frequently — new features drop almost weekly. I use Watchtower for automatic updates on non-critical services, but for an AI stack that I depend on daily I prefer manual control:

cd /opt/ollama-stack

# Pull the latest images
docker compose pull

# Recreate containers with new images (zero-config, data is in volumes)
docker compose up -d --remove-orphans

# Clean up old images to reclaim disk space
docker image prune -f

Model Recommendations by VPS Size

After testing on several configurations, here's what I'd recommend:

8 GB RAM (Hetzner CX32, RackNerd 8GB KVM): llama3.2:3b, phi4-mini, qwen2.5:3b. These load quickly and give reasonable quality for summarisation, coding assistance, and chat.
16 GB RAM (Hetzner CX42, Contabo VPS M): mistral:7b, llama3.1:8b, qwen2.5:7b. Much better reasoning quality, still CPU-only on most VPS plans.
32 GB+ RAM: llama3.1:70b quantized (Q4_K_M), which gets genuinely impressive results — though responses take 10–30 seconds per message on CPU.

Tip: In Open WebUI's settings you can configure multiple models as favourites and switch between them mid-conversation. I keep llama3.2:3b as my quick-response model and mistral:7b for anything that needs deeper reasoning — the UI makes switching effortless.

Conclusion

At this point you have a fully private, HTTPS-secured ChatGPT alternative running entirely on your own infrastructure. No usage limits, no per-token billing, no conversation history sent to a third party. For my personal use — drafting, coding, research summaries — this setup has replaced cloud AI APIs almost entirely.

Two natural next steps from here: first, look into setting up Authelia in front of Open WebUI if you want proper multi-user authentication with MFA rather than relying solely on Open WebUI's built-in login. Second, explore Open WebUI's RAG (retrieval-augmented generation) feature — you can upload documents and have the models answer questions about your own files, which makes this genuinely useful for local knowledge bases and private notes.

Setting Up Ollama with Open WebUI on a VPS for a Private ChatGPT Alternative

What You'll Need

Project Directory Structure

The Docker Compose File

Configuring Caddy as a Reverse Proxy

Starting the Stack and Pulling a Model

Firewall Hardening

Updating the Stack

Model Recommendations by VPS Size

Conclusion

Discussion