Ollama Fine-tuning: Customizing Models for Your Workflows

Ollama Fine-tuning: Customizing Models for Your Workflows

We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.

Running Ollama out of the box is powerful—but what if you need a model that understands your company's jargon, formats data the way your workflow expects, or specializes in a narrow task? That's where fine-tuning comes in. I've spent the last few months experimenting with Ollama fine-tuning on my homelab, and I can tell you it transforms generic models into purpose-built tools that actually get your work done better. This guide walks you through the entire process: preparing datasets, executing fine-tuning, and deploying your custom model.

Why Fine-tune Instead of Prompt Engineering?

I used to think prompt engineering was the answer to every problem. Give the model better instructions, more context, a few examples—surely that's enough? But I kept running into the same wall: a general-purpose model will always be general-purpose, no matter how clever your prompts are.

Fine-tuning solves this by actually changing the model's weights based on your data. This means:

The trade-off? Fine-tuning takes compute time and preparation. You need clean, representative data. But if you're running Ollama anyway, you already have the hardware—fine-tuning is just using it smarter.

Prerequisites and Hardware Reality Check

Let's be honest about what you need. Fine-tuning is not cheap on consumer hardware. I'm using an RTX 4090 in my homelab—an $1800 GPU that most of you probably don't have. But here's what actually works:

If you don't have a capable GPU locally, consider spinning up a cheap VPS for the fine-tuning job. You can get a public VPS with 1–2 GPUs for around $40/year from providers like RackNerd's seasonal deals, run the fine-tuning there, then pull the model back to your homelab for inference.

Tip: Fine-tuning on a rented GPU for a few hours is often cheaper than the electricity to do it on your own hardware. Calculate your local power costs before committing to home fine-tuning.

Preparing Your Training Dataset

This is where most people stumble. A bad dataset produces a useless model. Period.

I fine-tuned a model to format my home automation logs. My first attempt used 50 raw log lines. Result: gibberish. My second attempt used 500 carefully curated examples with explanations of what I wanted. Result: useful, but still hallucinating. Third attempt: 2,000 examples with strict formatting rules. That worked.

Here's what your dataset needs:

Let me show you the format. Each training example is a conversation turn:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a home automation log parser. Convert natural language descriptions into structured JSON logs."
    },
    {
      "role": "user",
      "content": "The kitchen lights turned on at 19:30 and the temperature reached 72 degrees"
    },
    {
      "role": "assistant",
      "content": "{\"device\": \"kitchen_lights\", \"action\": \"on\", \"timestamp\": \"19:30\", \"temperature\": 72}"
    }
  ]
}

I built my dataset by extracting real logs from my Home Assistant database, manually writing the "correct" parsed output for each one, and grouping them by scenario. You can automate some of this—write a script that generates variations of known-good examples—but always validate a sample by hand.

Save your dataset as a JSONL file (one JSON object per line) and validate it:

#!/bin/bash
# Validate JSONL format
while IFS= read -r line; do
  if ! echo "$line" | jq . > /dev/null 2>&1; then
    echo "Invalid JSON: $line"
    exit 1
  fi
done < training_data.jsonl

echo "Dataset valid: $(wc -l < training_data.jsonl) examples"

Setting Up Fine-tuning with Ollama and Docker

Ollama's built-in fine-tuning (as of version 0.2.0+) uses the Modelfile format. I prefer running this in Docker to keep dependencies isolated and make it reproducible.

Here's a Docker Compose setup that handles fine-tuning:

version: '3.8'

services:
  ollama-finetune:
    image: ollama/ollama:latest
    container_name: ollama-finetune
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - CUDA_VISIBLE_DEVICES=0
    volumes:
      - ./models:/root/.ollama/models
      - ./training_data:/data
      - ./finetune_output:/output
    ports:
      - "11434:11434"
    command: serve
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  # Optional: monitoring service
  ollama-monitor:
    image: prom/prometheus:latest
    container_name: ollama-monitor
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"

Start the container, then fine-tune your model inside it. First, pull a base model and create a Modelfile:

docker exec ollama-finetune ollama pull mistral

cat > Modelfile.finetune << 'EOF'
FROM mistral

# Set parameters for better instruction-following
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9

# Optional: inject your system prompt
SYSTEM "You are a specialized assistant for home automation logging."
EOF

# Execute fine-tuning on your dataset
docker exec ollama-finetune ollama create \
  --file Modelfile.finetune \
  my-custom-model:v1 \
  --finetune /data/training_data.jsonl
Watch out: The Ollama fine-tuning interface is still evolving. As of March 2026, the built-in method works, but for production use cases, I recommend using PyTorch or Hugging Face's TRL (Transformer Reinforcement Learning) library for more control. If you go that route, export your Ollama model to GGML format first with ollama export modelname > model.gguf.

Training and Validation

Once fine-tuning starts, monitor it closely. I watch three things: training loss (should decrease smoothly), validation accuracy (should improve), and GPU memory usage (shouldn't spike randomly).

Fine-tuning a 7B model on 2,000 examples typically takes 30 minutes to 2 hours on an RTX 4090. On consumer hardware, expect 4–12 hours.

After training completes, Ollama saves the fine-tuned model in your models directory. Test it immediately:

docker exec ollama-finetune ollama run my-custom-model:v1 \
  "The hallway motion sensor detected movement at 23:45 and the back door lock engaged"

# Output should be your formatted JSON, not generic text

Compare the output against your base model to see the difference:

docker exec ollama-finetune ollama run mistral \
  "The hallway motion sensor detected movement at 23:45 and the back door lock engaged"

# Base model likely produces verbose, unstructured text

If your fine-tuned model doesn't perform well, the problem is almost always the training data. Go back and add more examples of the failing case. Retrain. Don't tweak hyperparameters hoping for magic—clean data solves 95% of problems.

Deploying Your Custom Model

Once you're happy with the results, your custom model lives in the Ollama container. To use it across your homelab, export it and serve it via your reverse proxy:

# Export the model
docker exec ollama-finetune ollama export my-custom-model:v1 > my-custom-model.gguf

# Copy to your main Ollama instance
docker cp ollama-finetune:/my-custom-model.gguf ./models/
docker exec ollama ollama pull my-custom-model:v1

If you're running Ollama with Caddy or Nginx, add a route for your custom model API endpoint:

# Caddy example
localhost:11434 {
  reverse_proxy http://ollama:11434
}

Now your Home Assistant, Node-RED, or custom scripts can call your specialized model. Update your application configuration to use the custom model name instead of generic `mistral` or `llama2`.

Practical Tips from My Workflow

Version your models. I tag every fine-tuned model with a date and iteration number: `my-model:2026-03-29-v3`. If a new version performs worse, I roll back instantly.

Keep training data alongside code. I version-control my training JSONL files in Git (sanitized of sensitive data) so I can reproduce any model months later.

A/B test before committing. Before deploying a fine-tuned model to production, run it on a test set alongside the base model for a few days. Measure response quality and speed.

Expect diminishing returns. After 5,000 training examples, improvements plateau. If you hit a wall, you probably need a different base model, not more data.

Fine-tune incrementally. Don't wait to collect 10,000 examples. Fine-tune with 500, evaluate, then add another 500 based on what failed. This is much faster than one giant training run.

When Fine-tuning Isn't the Answer

Before you spend hours on this, ask yourself: am I solving a real problem, or creating work?

Fine-tuning is overkill if:

But if you have clean, representative data and you're running the same task repeatedly, fine-tuning will save you compute, context tokens, and frustration.

Next Steps

Start small. Pick one task—maybe parsing your logs, formatting documents, or classifying support tickets. Collect 500–1,000 good examples. Fine-tune a small model (7B). Validate. Ship it.

Once you've done it once, you'll understand the workflow and can scale to more complex models and larger datasets. The infrastructure is straightforward; the hard part is always the data.

Discussion