[GEAR] The Stack

> PUBLISHED: 2026-04-03 15:49 // AUTHOR: Zero Cloud Tax

Affiliate Disclosure: Some links on this page are affiliate links. If you buy through them, I receive a small commission at no extra cost to you. This directly funds the inference nodes and cloud costs that keep Zero Cloud Tax running — zero subscription required.

You don't need a server rack. Here's the exact hardware running a fully automated private AI newsroom, zero-trust VPN mesh, and local LLM inference stack — with zero cloud bills.

Choose Your Level

Not everyone needs a Mac Studio. Start where your budget and use case intersect, then scale up. Every tier runs on Linux with open-source software — no cloud required at any level.

LEVEL 1 — THE "UNO" STARTER

~$200 · Used SFF PC or NUC + Linux

Entry point. No GPU required. Runs Pi-hole, home automation, Tailscale mesh, and lightweight CPU-inference models like Phi-3 Mini or Gemma 2B. Learn the stack before investing in VRAM.

Use case: Pi-hole, n8n, home automation agents
Models: Phi-3 Mini, Gemma 2B, TinyLlama (CPU)
Hardware: Any Intel NUC, Beelink Mini PC, or used office SFF

LEVEL 2 — THE "MOOLAH" POWERHOUSE

~$1,200 · Gaming Laptop or Desktop + RTX 40-series

The sweet spot. Runs everything covered on this blog. 7B–14B models at useful speed, ComfyUI image generation, full n8n automation pipelines — all offline, all yours.

Use case: Local LLM inference, ComfyUI, RAG pipelines
Models: Llama 3.1 8B, Gemma 3 12B, Mistral (Q4 GGUF)
Hardware: Lenovo Legion + RTX 5070 — see below for affiliate link

LEVEL 3 — THE "STUDIO" PROFESSIONAL

$2,500+ · Mac Studio M2/M3 Max (32GB+ unified memory)

No VRAM fragmentation. Unified memory handles models that crash Linux GPU boxes. Run 30B+ parameter models, large-context pipelines, and multi-agent workflows without OOM errors.

Use case: 30B+ models, multi-agent systems, large context (128K+)
Models: Llama 3.1 70B (Q3), Qwen2.5 32B, Mixtral 8x7B
Hardware: Mac Studio M1/M2 Max — see below for affiliate link

Primary Inference Node — "Moolah"

Lenovo Legion Laptop

RTX 5070 (8GB VRAM) · 47GB DDR5 RAM

This is the engine. It handles Stable Diffusion image generation via ComfyUI, Whisper audio transcription, and local LLM inference at native CUDA speeds — entirely offline, no cloud API required.

The 8GB VRAM question: Most people assume you need 24GB+ VRAM to run useful local models. You don't — you need to understand quantization. Using GGUF Q4_K_M format, this card runs Llama 3.1 8B at 35–45 tokens/sec and Gemma 3 12B at 18–22 tokens/sec. The trick is allocating memory correctly: 47GB of system RAM gives WSL2 a dedicated 24GB pool, which means the CPU can offload model layers that don't fit on GPU without a performance cliff.

GPU: NVIDIA RTX 5070 — 8GB GDDR7 VRAM | RAM: 47GB DDR5 (upgraded) | Running: Ollama · LiteLLM · ComfyUI · Open WebUI

Currently $1,479.99 on Amazon — I paid $1,299.99.

View on Amazon →

Apple Silicon Node

Mac Studio M1 Max

32GB Unified Memory · 512GB SSD

While PCs are bottlenecked by discrete VRAM limits, Apple Silicon unified memory pools everything into a single address space. The M1 Max's 32GB can be fully allocated to a single model — meaning you can load 30B+ parameter models that would crash an 8GB GPU. I run Mistral 30B and Llama 3 70B (quantized) here at acceptable speeds, completely silently, drawing under 40W.

This is the quietest, most power-efficient inference node in the stack. For always-on homelab use, that matters more than raw throughput.

Chip: Apple M1 Max | Memory: 32GB Unified (CPU + GPU shared) | Running: Ollama · LM Studio · MLX inference

Currently $2,199.99 on Amazon — I paid $899.99.

View Latest Pricing →

Network Bastions — Low Power, Always-On

The minimum viable cluster. Repurposed laptops running Ubuntu Server — no expensive hardware required.

"Neo" — Intel Core i7 · 8GB RAM

Ubuntu Server 22.04 · Docker · Tailscale

The coordinator. Hosts Ghost CMS, n8n automation workflows, Caddy reverse proxy, and Listmonk mailing. A repurposed laptop drawing ~12W at idle. If you have an old i5/i7 machine collecting dust, this is exactly what to do with it.

"Uno" — Intel Celeron · 4GB RAM

Ubuntu Server · Pi-hole · Uptime Kuma · Tailscale

The watchdog. Runs Pi-hole for network-wide ad blocking, Uptime Kuma for monitoring every service in the mesh, and acts as the Tailscale exit node. A Celeron is genuinely sufficient — these services are I/O bound, not compute bound.

Content Creation Rig

DJI Osmo Mobile Gimbal

3-Axis Stabilization · Bluetooth Control

This is how I record b-roll at the server rack without a second person. Paired with an iPhone or Samsung, the Osmo's ActiveTrack keeps the frame locked on me while I'm hands-free at the terminal — typing commands, pointing at hardware, demonstrating inference output. The stabilization is the difference between "filmed in a homelab" and "filmed in a studio."

Currently $89.00 on Amazon.

View on Amazon →

Bluetooth Microphone

Wireless · Clip-on · Compatible with iOS + Android

The DJI gimbal handles visual stability; this handles audio. A wireless clip-on mic eliminates the wind and ambient noise that ruins homelab footage. Cable-free setup means I can move around the rack without worrying about dragging a mic cable into frame.

Currently $45.00 on Amazon.

View on Amazon →

Cloud Backbone

Hetzner VPS — 2GB RAM · 40GB SSD

Frankfurt · Cloudflare Tunnels · Nginx Proxy Manager

The only off-premise infrastructure I run — and it's free. This VPS sits in a Frankfurt data center and handles exactly one job: taking public internet traffic and routing it securely into my private Tailscale mesh without ever exposing my home IP address.

Why it's effectively free: Hetzner's referral program gives new accounts €20 in free credits. Once you spend €10, Hetzner drops €10 back into your account. A 2GB VPS costs €4.51/month — meaning the first 4+ months are covered, and after that the ongoing €10-for-€10 trade keeps the bill at near-zero.

Running: Cloudflare Tunnels · Nginx Proxy Manager | Cost: €4.51/month (effectively covered by referral credits)

Coming Soon Referral link — you get €20 free, I get a €10 credit when you spend €10. Win/win.

Cloudflare

DNS · Edge Caching · Zero-Trust Tunnels · Free Tier

Handles DNS routing, DDoS protection, edge caching, and zero-trust tunnels for the entire zerocloudtax.com domain. The free tier covers everything this stack needs. If you're not routing your homelab through Cloudflare, you're leaving security and performance on the table.

FAQ — Common Questions

Can you actually run LLMs on 8GB VRAM?

Yes — with quantization. Using GGUF Q4_K_M format, an 8GB card runs Llama 3.1 8B at 35–45 tokens/sec and Gemma 3 12B at 18–22 tokens/sec. The key is pairing it with enough system RAM (40GB+) so the CPU can handle model layers that don't fit on GPU. 8GB VRAM is not a wall — it's an optimization problem.

Is 40GB RAM enough for a WSL2 AI stack?

Comfortably, if you allocate it correctly. I dedicate 24GB to WSL2 via .wslconfig, leaving the remaining RAM for Windows, n8n workflows, and background processes. If you're running models under 13B parameters, 32GB total is the practical minimum. For 30B+ models, 64GB lets you run everything without paging.

Why a Hetzner VPS instead of a home server as the gateway?

Exposing your home IP is a security and ISP terms-of-service risk. The Hetzner VPS acts as a clean public endpoint — it forwards traffic into your Tailscale mesh without revealing your home network. If the VPS gets DDoSed or compromised, your home network is untouched. At €4.51/month (or free with referral credits), it's the cheapest insurance in the stack.

Why a Mac Studio for a homelab that's already GPU-heavy?

Different workloads. The Legion handles GPU-intensive tasks (image gen, Whisper transcription). The Mac Studio handles large-model inference where unified memory wins — 30B+ parameter models that would run out of VRAM on the Legion load cleanly into the M1 Max's shared 32GB pool. Two specialized nodes outperform one generalist node for this class of workload.

What's the total monthly cost to run all of this?

Hardware is a one-time purchase. Ongoing costs: Hetzner VPS (~€4.51/month, offset by referral credits), Cloudflare (free tier), Ghost CMS (self-hosted, free), n8n (self-hosted, free). The only variable cost is electricity — the whole stack idles at under 60W combined. Monthly electricity cost: approximately $4–6 depending on your rate.

Want help building this?

If you're trying to replicate this stack — or build something similar for your own use case — I offer a free 30-minute discovery call. We go through your hardware, your goals, and I'll tell you exactly what I'd do differently. No pitch. Just architecture.

Book a Free Discovery Call → View Consulting Services