Do you want to experiment with a local LLM but don’t have a GPU?
Good news: you don’t need one.
I’ve been building MarvinOS Local AI Stack (CPU-Only); a fully self-hosted AI environment that runs entirely on regular CPU hardware, with no cloud dependency, and no “sign up for an API key” required just to get started.
If you’ve wanted to explore local LLMs, build a private knowledge base, or run AI tools in an offline lab environment, this stack is made for exactly that.
What it is
MarvinOS Local AI Stack (CPU-Only) is a Docker-based, rebuild-safe AI stack that gives you:
-
a local LLM
-
a full web chat UI
-
RAG (Retrieval-Augmented Generation) with a vector database
-
private web search grounding
-
local text-to-speech
-
HTTPS ingress with Nginx
…and it all runs locally, on CPU.
Why CPU-only matters
Not everyone has a spare RTX card lying around.
And even if you do, sometimes you want something that can run on:
-
a small server
-
a dev workstation
-
a home lab box
-
a “boring” corporate machine
-
an offline environment
This stack is designed to make local AI accessible, repeatable, and easy to rebuild without losing your data.
What’s included
This stack is made up of a few solid components that work well together:
-
Ollama (LLM runtime + embeddings)
-
Open WebUI (chat UI + RAG orchestration)
-
Qdrant (vector database)
-
SearXNG (private meta-search)
-
Nginx (HTTPS ingress + reverse proxy)
-
openai-edge-tts (local OpenAI-compatible TTS)
Everything is wired together using Docker Compose and designed to survive rebuilds.
The models (fast + CPU-friendly)
This stack is tuned around small models that run well without GPU acceleration:
LLM
-
llama3.2:1b
Embeddings (RAG)
-
qwen3-embedding:0.6b
That combination keeps things lightweight while still giving you real RAG, real document ingestion, and a real usable local chat environment.
RAG without external dependencies
One of the main goals here was to make RAG actually practical locally.
The flow looks like this:
-
Upload documents in Open WebUI
-
Embed locally using Ollama
-
Store vectors in Qdrant
-
Retrieve context per question
-
Inject into the prompt
-
Return responses with citations
No cloud. No external services. No “we send your documents somewhere.”
Built to survive rebuilds
The stack stores persistent data outside the repo, so you can rebuild containers, update configs, and iterate without wiping out:
-
models
-
documents
-
embeddings
-
WebUI state
-
configs
Getting started (3 commands)
If you already have Docker + Compose installed, it’s a simple clone-and-run setup:
Once it’s running, you’ll be able to access the stack locally and start experimenting immediately.
Who this is for
This stack is meant for people who want local AI without the usual friction:
-
homelab builders
-
IT pros testing LLM workflows
-
developers building RAG pipelines
-
offline / air-gapped environments
-
MarvinOS development
If you’ve been curious about local LLMs but thought you needed a GPU to even begin, this is your on-ramp.
What’s next
This is the foundation.
From here, it’s easy to extend into:
-
better ingestion pipelines
-
larger models (if you have the CPU/RAM)
-
internal tool calling
-
multi-user setups
-
full MarvinOS integrations
If you spin it up and build something cool with it, I’d love to see what you do.

Comments
Post a Comment