Ollama Fine-tuning: Customizing Models for Your Workflows
We earn commissions when you shop through the links on this page, at no additional cost to you. Learn more.
Running Ollama out of the box is powerful—but what if you need a model that understands your company's jargon, formats data the way your workflow expects, or specializes in a narrow task? That's where fine-tuning comes in. I've spent the last few months experimenting with Ollama fine-tuning on my homelab, and I can tell you it transforms generic models into purpose-built tools that actually get your work done better. This guide walks you through the entire process: preparing datasets, executing fine-tuning, and deploying your custom model.
Why Fine-tune Instead of Prompt Engineering?
I used to think prompt engineering was the answer to every problem. Give the model better instructions, more context, a few examples—surely that's enough? But I kept running into the same wall: a general-purpose model will always be general-purpose, no matter how clever your prompts are.
Fine-tuning solves this by actually changing the model's weights based on your data. This means:
- Consistency: The model learns to behave the same way every time, without needing in-context examples.
- Efficiency: Smaller context windows needed. You can ask simpler questions and get better answers.
- Cost: A fine-tuned 7B model can outperform a 70B general model on your specific task.
- Privacy: Your training data never leaves your homelab.
The trade-off? Fine-tuning takes compute time and preparation. You need clean, representative data. But if you're running Ollama anyway, you already have the hardware—fine-tuning is just using it smarter.
Prerequisites and Hardware Reality Check
Let's be honest about what you need. Fine-tuning is not cheap on consumer hardware. I'm using an RTX 4090 in my homelab—an $1800 GPU that most of you probably don't have. But here's what actually works:
- RTX 4060 (8GB VRAM) or better: Fine-tune 7B models with gradient checkpointing. Slow, but functional.
- RTX 4080 or RTX 4090: Fine-tune 13B models comfortably, 7B models in minutes.
- Consumer CPU only: You can technically fine-tune on CPU, but expect 10–100x slower execution. Not practical for iterative work.
If you don't have a capable GPU locally, consider spinning up a cheap VPS for the fine-tuning job. You can get a public VPS with 1–2 GPUs for around $40/year from providers like RackNerd's seasonal deals, run the fine-tuning there, then pull the model back to your homelab for inference.
Preparing Your Training Dataset
This is where most people stumble. A bad dataset produces a useless model. Period.
I fine-tuned a model to format my home automation logs. My first attempt used 50 raw log lines. Result: gibberish. My second attempt used 500 carefully curated examples with explanations of what I wanted. Result: useful, but still hallucinating. Third attempt: 2,000 examples with strict formatting rules. That worked.
Here's what your dataset needs:
- Quantity: Minimum 500 examples, ideally 2,000–5,000 for meaningful improvement.
- Quality: Every example must be correct. Wrong labels are worse than no training at all.
- Diversity: Include edge cases and variations. If you only show sunny weather logs, the model learns only sunny weather.
- Format: Ollama uses a simple JSON format with system prompts, user messages, and assistant responses.
Let me show you the format. Each training example is a conversation turn:
{
"messages": [
{
"role": "system",
"content": "You are a home automation log parser. Convert natural language descriptions into structured JSON logs."
},
{
"role": "user",
"content": "The kitchen lights turned on at 19:30 and the temperature reached 72 degrees"
},
{
"role": "assistant",
"content": "{\"device\": \"kitchen_lights\", \"action\": \"on\", \"timestamp\": \"19:30\", \"temperature\": 72}"
}
]
}
I built my dataset by extracting real logs from my Home Assistant database, manually writing the "correct" parsed output for each one, and grouping them by scenario. You can automate some of this—write a script that generates variations of known-good examples—but always validate a sample by hand.
Save your dataset as a JSONL file (one JSON object per line) and validate it:
#!/bin/bash
# Validate JSONL format
while IFS= read -r line; do
if ! echo "$line" | jq . > /dev/null 2>&1; then
echo "Invalid JSON: $line"
exit 1
fi
done < training_data.jsonl
echo "Dataset valid: $(wc -l < training_data.jsonl) examples"
Setting Up Fine-tuning with Ollama and Docker
Ollama's built-in fine-tuning (as of version 0.2.0+) uses the Modelfile format. I prefer running this in Docker to keep dependencies isolated and make it reproducible.
Here's a Docker Compose setup that handles fine-tuning:
version: '3.8'
services:
ollama-finetune:
image: ollama/ollama:latest
container_name: ollama-finetune
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- CUDA_VISIBLE_DEVICES=0
volumes:
- ./models:/root/.ollama/models
- ./training_data:/data
- ./finetune_output:/output
ports:
- "11434:11434"
command: serve
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# Optional: monitoring service
ollama-monitor:
image: prom/prometheus:latest
container_name: ollama-monitor
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
command:
- "--config.file=/etc/prometheus/prometheus.yml"
Start the container, then fine-tune your model inside it. First, pull a base model and create a Modelfile:
docker exec ollama-finetune ollama pull mistral
cat > Modelfile.finetune << 'EOF'
FROM mistral
# Set parameters for better instruction-following
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9
# Optional: inject your system prompt
SYSTEM "You are a specialized assistant for home automation logging."
EOF
# Execute fine-tuning on your dataset
docker exec ollama-finetune ollama create \
--file Modelfile.finetune \
my-custom-model:v1 \
--finetune /data/training_data.jsonl
ollama export modelname > model.gguf.Training and Validation
Once fine-tuning starts, monitor it closely. I watch three things: training loss (should decrease smoothly), validation accuracy (should improve), and GPU memory usage (shouldn't spike randomly).
Fine-tuning a 7B model on 2,000 examples typically takes 30 minutes to 2 hours on an RTX 4090. On consumer hardware, expect 4–12 hours.
After training completes, Ollama saves the fine-tuned model in your models directory. Test it immediately:
docker exec ollama-finetune ollama run my-custom-model:v1 \
"The hallway motion sensor detected movement at 23:45 and the back door lock engaged"
# Output should be your formatted JSON, not generic text
Compare the output against your base model to see the difference:
docker exec ollama-finetune ollama run mistral \
"The hallway motion sensor detected movement at 23:45 and the back door lock engaged"
# Base model likely produces verbose, unstructured text
If your fine-tuned model doesn't perform well, the problem is almost always the training data. Go back and add more examples of the failing case. Retrain. Don't tweak hyperparameters hoping for magic—clean data solves 95% of problems.
Deploying Your Custom Model
Once you're happy with the results, your custom model lives in the Ollama container. To use it across your homelab, export it and serve it via your reverse proxy:
# Export the model
docker exec ollama-finetune ollama export my-custom-model:v1 > my-custom-model.gguf
# Copy to your main Ollama instance
docker cp ollama-finetune:/my-custom-model.gguf ./models/
docker exec ollama ollama pull my-custom-model:v1
If you're running Ollama with Caddy or Nginx, add a route for your custom model API endpoint:
# Caddy example
localhost:11434 {
reverse_proxy http://ollama:11434
}
Now your Home Assistant, Node-RED, or custom scripts can call your specialized model. Update your application configuration to use the custom model name instead of generic `mistral` or `llama2`.
Practical Tips from My Workflow
Version your models. I tag every fine-tuned model with a date and iteration number: `my-model:2026-03-29-v3`. If a new version performs worse, I roll back instantly.
Keep training data alongside code. I version-control my training JSONL files in Git (sanitized of sensitive data) so I can reproduce any model months later.
A/B test before committing. Before deploying a fine-tuned model to production, run it on a test set alongside the base model for a few days. Measure response quality and speed.
Expect diminishing returns. After 5,000 training examples, improvements plateau. If you hit a wall, you probably need a different base model, not more data.
Fine-tune incrementally. Don't wait to collect 10,000 examples. Fine-tune with 500, evaluate, then add another 500 based on what failed. This is much faster than one giant training run.
When Fine-tuning Isn't the Answer
Before you spend hours on this, ask yourself: am I solving a real problem, or creating work?
Fine-tuning is overkill if:
- You only need the model to know 5–10 specific facts. Use retrieval-augmented generation (RAG) instead—feed context directly to the prompt.
- Your task is completely novel and you have almost no training examples. You need at least a few hundred good examples.
- The base model already does what you need with a good prompt. Seriously, try a 3-shot prompt first. It's free.
But if you have clean, representative data and you're running the same task repeatedly, fine-tuning will save you compute, context tokens, and frustration.
Next Steps
Start small. Pick one task—maybe parsing your logs, formatting documents, or classifying support tickets. Collect 500–1,000 good examples. Fine-tune a small model (7B). Validate. Ship it.
Once you've done it once, you'll understand the workflow and can scale to more complex models and larger datasets. The infrastructure is straightforward; the hard part is always the data.
Discussion