I Stopped Paying for AI APIs — Here's What I Run Locally Instead

> PUBLISHED: 2026-04-10 22:21 // AUTHOR: Zero Cloud Tax > TAGS: [homelab] [ollama] [local-ai] [Zero Cloud Tax Brief] [guides] > ~1 min read MIN READ_

Six months ago I was spending $120/month between ChatGPT Plus, Claude Pro, and various API calls for my home automation experiments. Today I spend $0 — and get better results. Here's exactly how.

The Breaking Point

The trigger was a $47 invoice from OpenAI for API usage during a weekend project. I was running a simple n8n workflow that summarized RSS feeds — something that should cost pennies. That's when I decided to run everything locally.

The Stack

My setup runs on a single machine with an RTX 5070 and 47GB of RAM:

Ollama — serves local LLMs via a clean REST API, identical interface to OpenAI
LiteLLM — unified proxy that maps OpenAI-style calls to any local model
Open WebUI — browser interface for model switching and chat history
n8n — automation engine connecting everything together

The Models

For most tasks, llama3.1:8b handles 90% of what I used to pay GPT-4 for. For complex reasoning I run qwen3:8b or gemma3:12b. Zero API costs. Zero data leaving my network.

The Setup in 15 Minutes

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1:8b

# Run Open WebUI via Docker
docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:latest

What You Give Up

Honesty: local 8B models are not GPT-4. They are slower for complex multi-step reasoning. But for summarization, classification, and automation tasks? They are indistinguishable — and they are private, offline-capable, and free forever.

The $120/month I saved pays for the GPU in about 18 months. After that it is pure margin.

The Breaking Point

The Stack

The Models

The Setup in 15 Minutes

What You Give Up

Get the Zero Cloud Tax Brief