Monitoring Docker Containers and VPS Resources with Prometheus and Grafana

Monitoring Docker Containers and VPS Resources with Prometheus and Grafana

I learned the hard way that flying blind on a self-hosted setup leads to disaster. One evening, a runaway container process consumed all my RAM, and I only noticed when SSH became unresponsive. I needed visibility into what was actually running. Prometheus and Grafana together give you that insight—real-time metrics, historical data, and beautiful dashboards that make troubleshooting obvious.

Why Prometheus and Grafana Matter for Homelabs

If you're running anything serious on a VPS or homelab—whether it's Nextcloud, Jellyfin, or a cluster of Docker containers—you need to know what's happening. CPU spikes, memory leaks, disk space exhaustion—these problems are silent until they break your services.

Prometheus scrapes metrics from your containers and host system at regular intervals. Grafana then visualizes those metrics with dashboards you can customize. Unlike cloud monitoring solutions, everything runs on your own hardware. No vendor lock-in, no external dependencies, just local observability.

I've been running this stack on a modest RackNerd KVM VPS for months without issues. The resource overhead is negligible—maybe 100–150 MB of RAM total.

Setting Up Prometheus with Docker Compose

First, you need Prometheus to collect metrics. The simplest approach is Docker Compose. Create a directory for your monitoring stack:

mkdir -p ~/monitoring/prometheus
cd ~/monitoring

Next, create the Prometheus configuration file at prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

This configuration tells Prometheus to scrape itself, a Node Exporter (for system metrics), and cAdvisor (for container metrics). Now create the Docker Compose file:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    restart: unless-stopped
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    restart: unless-stopped
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    restart: unless-stopped
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
      - GF_INSTALL_PLUGINS=grafana-piechart-panel
    volumes:
      - grafana-data:/var/lib/grafana
    restart: unless-stopped
    networks:
      - monitoring

volumes:
  prometheus-data:
  grafana-data:

networks:
  monitoring:
    driver: bridge

Bring it all up with:

docker-compose up -d

Within a minute, Prometheus will start scraping metrics. You can access Prometheus at http://your-host:9090 and see the targets on the Status page.

Watch out: The default Grafana password in the compose file is weak. Change your-secure-password to something strong before running this in production. Also, don't expose Grafana directly to the internet without a reverse proxy (Caddy or Nginx with authentication).

Configuring Grafana Dashboards

Now the fun part—creating dashboards. Access Grafana at http://your-host:3000 with username admin and your password.

First, add Prometheus as a data source:

  1. Go to Configuration → Data Sources
  2. Click Add data source
  3. Select Prometheus
  4. Set the URL to http://prometheus:9090
  5. Click Save & test

Next, import pre-built dashboards. I recommend these:

To import a dashboard:

  1. Go to Dashboards → Import
  2. Enter the dashboard ID (e.g., 1860)
  3. Select Prometheus as the data source
  4. Click Import

Within seconds, you'll have a working dashboard showing CPU, memory, disk I/O, and network metrics. The Node Exporter dashboard is especially useful for spotting trends and anomalies.

Custom Alerting and Queries

The real power comes from writing your own queries. In Grafana, you can build panels using PromQL (Prometheus Query Language).

For example, to see container CPU usage by container name:

rate(container_cpu_usage_seconds_total[5m]) * 100

Or to alert when memory usage exceeds 80%:

container_memory_usage_bytes / container_spec_memory_limit_bytes * 100 > 80

I prefer to create simple dashboard panels first, verify the metrics are correct, then set up alerts. Grafana has built-in alerting that can send notifications to Slack, email, or webhooks.

Tip: Start with the pre-built dashboards to understand what metrics are available. Once you're familiar, create custom panels tailored to your specific services. For instance, if you run Nextcloud, add panels for active sessions and file sync metrics.

Collecting Metrics from Specific Containers

If you have applications that expose Prometheus metrics (like Nextcloud with prometheus-plugin, or any app using a standard exporter), you can scrape them directly. Add a new scrape job to prometheus.yml:

  - job_name: 'nextcloud'
    static_configs:
      - targets: ['nextcloud:9090']

  - job_name: 'vaultwarden'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['vaultwarden:80']

Reload Prometheus (it auto-detects config changes every 30 seconds) and those metrics will start flowing in.

Storage and Retention

By default, Prometheus stores data for 15 days. On a small VPS, this typically consumes 1–3 GB. If you want longer retention, adjust the docker-compose command:

    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=90d'

This stores 90 days of data. Monitor the volume size with:

docker exec prometheus du -sh /prometheus

Reverse Proxy Integration

I strongly recommend putting Grafana behind a reverse proxy like Caddy for security and convenience. Here's a simple Caddyfile entry:

grafana.yourdomain.com {
  reverse_proxy localhost:3000
  encode gzip
}

prometheus.yourdomain.com {
  reverse_proxy localhost:9090
}

This gives you HTTPS automatically, and you can add authentication middleware if needed. Never expose monitoring dashboards to the raw internet—always use a reverse proxy with strong auth.

What's Next?

Once your monitoring stack is running, explore alerting rules, set up notifications, and customize dashboards for your specific workloads. If you're running on a budget VPS like those from RackNerd, monitoring becomes essential to catch resource exhaustion before it impacts your services.

From here, consider adding Loki for log aggregation, or integrating with external tools like Uptime Kuma for status page monitoring. The foundation is solid—everything else builds on top.

Discussion

```