Monitoring Your VPS and Homelab with Grafana, Prometheus, and Docker

Monitoring Your VPS and Homelab with Grafana, Prometheus, and Docker

We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.

Flying blind with your homelab is a disaster waiting to happen. I've had a VPS run out of disk space at 2am, taking down four services with it — and I only found out when a friend texted me asking why my Nextcloud was down. That pain is entirely avoidable with a proper monitoring stack. In this tutorial, I'll walk you through deploying Grafana, Prometheus, Node Exporter, and cAdvisor in a single Docker Compose file so you get real-time system metrics, container stats, and beautiful dashboards in under 30 minutes.

This setup works equally well on a budget VPS (I run it on a Hetzner CAX11 ARM instance), a home server, or a Raspberry Pi 4. If you're looking to spin up a fresh VPS to host this stack, DigitalOcean Droplets offer dependable uptime with a 99.99% SLA and predictable monthly pricing — a solid base for any monitoring project.

What We're Building

The monitoring stack has four components, each with a specific job:

I prefer this stack over alternatives like Netdata because Prometheus's pull model is extremely flexible — you can add exporters for Postgres, Redis, Nginx, Blackbox HTTP checks, and dozens of other things without changing your Grafana setup.

Prerequisites

You need Docker and Docker Compose (v2, the plugin version) installed. On Ubuntu 22.04 or 24.04:

sudo apt update && sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
# Log out and back in for the group change to take effect

Also make sure ports 3000 (Grafana) and 9090 (Prometheus) are firewalled from the public internet. I'll cover that at the end, but keep UFW in mind throughout.

Project Structure

Create a dedicated directory and set up the config files before you write the Compose file:

mkdir -p ~/monitoring/prometheus
cd ~/monitoring

# Create the Prometheus scrape config
cat > prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node_exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
EOF

Now create the main docker-compose.yml:

cat > docker-compose.yml << 'EOF'
version: "3.8"

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data:
  grafana_data:

services:

  prometheus:
    image: prom/prometheus:v2.52.0
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    ports:
      - "127.0.0.1:9090:9090"
    networks:
      - monitoring

  node_exporter:
    image: prom/node-exporter:v1.8.0
    container_name: node_exporter
    restart: unless-stopped
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    devices:
      - /dev/kmsg:/dev/kmsg
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /cgroup:/cgroup:ro
    networks:
      - monitoring

  grafana:
    image: grafana/grafana-oss:11.0.0
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme_now
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://localhost:3000
    ports:
      - "127.0.0.1:3000:3000"
    networks:
      - monitoring
    depends_on:
      - prometheus
EOF
Watch out: Both Prometheus and Grafana are bound to 127.0.0.1 in this Compose file, which means they won't be directly accessible from the internet — only via a reverse proxy or SSH tunnel. Do not change these to 0.0.0.0 without first putting Grafana behind Caddy or Nginx with authentication. Exposing Prometheus publicly leaks your entire infrastructure topology to anyone who looks.

Starting the Stack

cd ~/monitoring
docker compose up -d

# Verify all four containers are running
docker compose ps

# Tail the logs to catch any startup errors
docker compose logs -f --tail=50

Within a minute you should see all four containers in the Up state. Prometheus will immediately start scraping Node Exporter and cAdvisor.

Accessing Grafana via SSH Tunnel

Since Grafana is bound to localhost on the server, the easiest way to access it during setup is an SSH tunnel from your local machine:

# Run this on your LOCAL machine, not the server
ssh -L 3000:127.0.0.1:3000 -L 9090:127.0.0.1:9090 user@your-server-hostname -N

Now open http://localhost:3000 in your browser. Log in with username admin and the password you set in the Compose file (you did change changeme_now, right?). Change the password immediately on first login.

Connecting Prometheus as a Datasource

In Grafana, go to Connections → Data sources → Add data source. Select Prometheus. Set the URL to http://prometheus:9090 — because both containers are on the same monitoring bridge network, Grafana can reach Prometheus by container name. Click Save & test and you should see a green "Data source is working" banner.

Importing Pre-Built Dashboards

I don't build dashboards from scratch when excellent community ones exist. Go to Dashboards → Import and use these Grafana dashboard IDs:

For each import, paste the ID into the "Import via grafana.com" field, click Load, select your Prometheus datasource from the dropdown, and hit Import. You'll have production-quality dashboards in about two minutes.

Tip: On the Node Exporter Full dashboard (1860), set the instance variable in the top-left dropdown if you have multiple hosts. You can run Node Exporter on every machine in your homelab and add each one to prometheus.yml under the node_exporter job as a separate target — they'll all appear in the same dashboard.

Adding More Hosts to Prometheus

One of the best things about this setup is how easy it is to scale. On any other Linux host in your homelab (or Tailscale network), spin up Node Exporter:

# On the remote host — runs Node Exporter and exposes it only to the Tailscale interface
docker run -d \
  --name node_exporter \
  --restart unless-stopped \
  --pid host \
  --network host \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /:/rootfs:ro \
  prom/node-exporter:v1.8.0 \
  --path.procfs=/host/proc \
  --path.sysfs=/host/sys \
  --path.rootfs=/rootfs \
  --web.listen-address="100.x.x.x:9100"
  # Replace 100.x.x.x with that host's Tailscale IP

Then add it to your prometheus/prometheus.yml under node_exporter targets and reload Prometheus with:

curl -X POST http://localhost:9090/-/reload

The new host appears in Grafana within 15 seconds — no restart required thanks to the --web.enable-lifecycle flag we passed to Prometheus.

Firewall Rules

Lock things down with UFW. The only ports that should be public are SSH (22), and whatever port your reverse proxy listens on (80/443). Everything else stays internal:

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# If you're on Tailscale, allow traffic from the tailscale0 interface
sudo ufw allow in on tailscale0
sudo ufw enable
sudo ufw status verbose

Because Prometheus and Grafana are bound to 127.0.0.1 in the Compose file, UFW doesn't even need specific rules for ports 9090 or 3000 — they simply aren't reachable from outside. Docker's iptables rules can bypass UFW for ports bound to 0.0.0.0, which is another reason I explicitly bind to localhost in the Compose file.

Putting Grafana Behind a Reverse Proxy

For permanent access without an SSH tunnel, I put Grafana behind Caddy. A minimal Caddyfile snippet:

grafana.yourdomain.com {
    reverse_proxy 127.0.0.1:3000
    basicauth /* {
        admin $2a$14$hashed_password_here
    }
}

The basicauth block adds an extra authentication layer on top of Grafana's own login — good practice for anything exposed to the internet. Generate the bcrypt hash with caddy hash-password.

Data Retention and Disk Usage

With a 15-second scrape interval and three targets, Prometheus uses roughly 1–2 GB of disk per month at 30-day retention. On a tight VPS, reduce retention to 15 days by changing --storage.tsdb.retention.time=15d in the Compose file. Check current Prometheus disk usage at any time via the "Prometheus 2.0 Overview" dashboard you imported earlier — it shows WAL size and chunk count in real time.

What to Monitor Next

Once this base stack is running, the natural next step is adding alerting. Prometheus ships with Alertmanager, which can fire notifications to Slack, PagerDuty, email, or a webhook when thresholds are crossed — for example, "disk on /dev/sda1 is above 85% for 5 minutes." That's a separate container and a few alerting rules in a rules.yml file; once you're comfortable with the base stack, it takes about an hour to add.

You might also consider creating a DigitalOcean account if you want a clean, cheap VPS dedicated exclusively to running your monitoring stack — keeping observability infrastructure separate from the services it watches is a genuinely good architectural decision. A $4/month Droplet is more than enough for this entire stack.

The combination of Prometheus, Grafana, Node Exporter, and cAdvisor gives you a complete picture of both your host hardware and your container workloads with essentially zero ongoing maintenance. Once it's running, it just runs — and the next time something goes sideways at 2am, you'll know about it before your friends do.

Discussion