Six months ago I was spending $120/month between ChatGPT Plus, Claude Pro, and various API calls for my home automation experiments. Today I spend $0 — and get better results. Here's exactly how.
The Breaking Point
The trigger was a $47 invoice from OpenAI for API usage during a weekend project. I was running a simple n8n workflow that summarized RSS feeds — something that should cost pennies. That's when I decided to run everything locally.
The Stack
My setup runs on a single machine with an RTX 5070 and 47GB of RAM:
- Ollama — serves local LLMs via a clean REST API, identical interface to OpenAI
- LiteLLM — unified proxy that maps OpenAI-style calls to any local model
- Open WebUI — browser interface for model switching and chat history
- n8n — automation engine connecting everything together
The Models
For most tasks, llama3.1:8b handles 90% of what I used to pay GPT-4 for. For complex reasoning I run qwen3:8b or gemma3:12b. Zero API costs. Zero data leaving my network.
The Setup in 15 Minutes
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.1:8b
# Run Open WebUI via Docker
docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:latestWhat You Give Up
Honesty: local 8B models are not GPT-4. They are slower for complex multi-step reasoning. But for summarization, classification, and automation tasks? They are indistinguishable — and they are private, offline-capable, and free forever.
The $120/month I saved pays for the GPU in about 18 months. After that it is pure margin.