RTX 5070: Running a Full Local AI Stack for $15/Month in Electricity

> PUBLISHED: 2026-04-10 22:21 // AUTHOR: Zero Cloud Tax > TAGS: [homelab] [rtx-5070] [hardware] [local-ai] [guides] > ~1 min read MIN READ_

The RTX 5070 is one of the best value GPUs you can buy right now for a local AI homelab. 8GB GDDR7 — in a Lenovo Legion laptop. Here is how I spec'd mine and what it runs today.

Why the 5070 Over a 4090

The 4090 has 24GB VRAM but costs 2-3x more and runs hot. The 5070 with 8GB handles every model I throw at it through Ollama — including 12B parameter models with room to spare. The new Blackwell architecture also adds improved FP8 inference performance that matters for local LLMs.

The Full Build

GPU: RTX 5070 8GB GDDR7
RAM: 47GB RAM
OS: Windows 11 + WSL2 Ubuntu (Docker runs in WSL2, gets GPU access natively)
Storage: 1TB SSD

What Fits in 8GB VRAM

llama3.1:8b — ~6GB, fast inference, great for automation tasks
gemma3:12b — ~9GB, better reasoning, use for drafting
qwen3:8b — ~6GB, strong multilingual + code
SDXL image generation — ~6GB, runs alongside 8B LLMs with careful VRAM management

The WSL2 + Docker Setup

# Install NVIDIA drivers on Windows (not WSL2 — Windows handles GPU passthrough)
# In WSL2:
sudo apt install nvidia-cuda-toolkit

# Verify GPU visible in WSL2
nvidia-smi

# Docker gets GPU access automatically via WSL2 integration
docker run --gpus all nvidia/cuda:12.0-base nvidia-smi

The Power Bill

Idle: ~80W. Under inference load: 180-220W. Running 24/7 for a month costs roughly $12-15 in electricity at average US rates. With $0 in API costs, that is still a massive saving over cloud bills.

This machine runs Ghost CMS, n8n, Ollama, LiteLLM, Open WebUI, Supabase, and Listmonk simultaneously without breaking a sweat. That is a full SaaS stack on one box.

Why the 5070 Over a 4090

The Full Build

What Fits in 8GB VRAM

The WSL2 + Docker Setup

The Power Bill

Get the Zero Cloud Tax Brief