The RTX 5070 is one of the best value GPUs you can buy right now for a local AI homelab. 8GB GDDR7 — in a Lenovo Legion laptop. Here is how I spec'd mine and what it runs today.
Why the 5070 Over a 4090
The 4090 has 24GB VRAM but costs 2-3x more and runs hot. The 5070 with 8GB handles every model I throw at it through Ollama — including 12B parameter models with room to spare. The new Blackwell architecture also adds improved FP8 inference performance that matters for local LLMs.
The Full Build
- GPU: RTX 5070 8GB GDDR7
- RAM: 47GB RAM
- OS: Windows 11 + WSL2 Ubuntu (Docker runs in WSL2, gets GPU access natively)
- Storage: 1TB SSD
What Fits in 8GB VRAM
llama3.1:8b— ~6GB, fast inference, great for automation tasksgemma3:12b— ~9GB, better reasoning, use for draftingqwen3:8b— ~6GB, strong multilingual + code- SDXL image generation — ~6GB, runs alongside 8B LLMs with careful VRAM management
The WSL2 + Docker Setup
# Install NVIDIA drivers on Windows (not WSL2 — Windows handles GPU passthrough)
# In WSL2:
sudo apt install nvidia-cuda-toolkit
# Verify GPU visible in WSL2
nvidia-smi
# Docker gets GPU access automatically via WSL2 integration
docker run --gpus all nvidia/cuda:12.0-base nvidia-smiThe Power Bill
Idle: ~80W. Under inference load: 180-220W. Running 24/7 for a month costs roughly $12-15 in electricity at average US rates. With $0 in API costs, that is still a massive saving over cloud bills.
This machine runs Ghost CMS, n8n, Ollama, LiteLLM, Open WebUI, Supabase, and Listmonk simultaneously without breaking a sweat. That is a full SaaS stack on one box.