Do you want to experiment with a local LLM but don’t have a GPU?

Good news: you don’t need one.

I’ve been building MarvinOS Local AI Stack (CPU-Only); a fully self-hosted AI environment that runs entirely on regular CPU hardware, with no cloud dependency, and no “sign up for an API key” required just to get started.

If you’ve wanted to explore local LLMs, build a private knowledge base, or run AI tools in an offline lab environment, this stack is made for exactly that.

What it is

MarvinOS Local AI Stack (CPU-Only) is a Docker-based, rebuild-safe AI stack that gives you:

a local LLM
a full web chat UI
RAG (Retrieval-Augmented Generation) with a vector database
private web search grounding
local text-to-speech
HTTPS ingress with Nginx

…and it all runs locally, on CPU.

Why CPU-only matters

Not everyone has a spare RTX card lying around.

And even if you do, sometimes you want something that can run on:

a small server
a dev workstation
a home lab box
a “boring” corporate machine
an offline environment

This stack is designed to make local AI accessible, repeatable, and easy to rebuild without losing your data.

What’s included

This stack is made up of a few solid components that work well together:

Ollama (LLM runtime + embeddings)
Open WebUI (chat UI + RAG orchestration)
Qdrant (vector database)
SearXNG (private meta-search)
Nginx (HTTPS ingress + reverse proxy)
openai-edge-tts (local OpenAI-compatible TTS)

Everything is wired together using Docker Compose and designed to survive rebuilds.

The models (fast + CPU-friendly)

This stack is tuned around small models that run well without GPU acceleration:

LLM

llama3.2:1b

Embeddings (RAG)

qwen3-embedding:0.6b

That combination keeps things lightweight while still giving you real RAG, real document ingestion, and a real usable local chat environment.

RAG without external dependencies

One of the main goals here was to make RAG actually practical locally.

The flow looks like this:

Upload documents in Open WebUI
Embed locally using Ollama
Store vectors in Qdrant
Retrieve context per question
Inject into the prompt
Return responses with citations

No cloud. No external services. No “we send your documents somewhere.”

Built to survive rebuilds

The stack stores persistent data outside the repo, so you can rebuild containers, update configs, and iterate without wiping out:

models
documents
embeddings
WebUI state
configs

Getting started (3 commands)

If you already have Docker + Compose installed, it’s a simple clone-and-run setup:


git clone https://github.com/MarvinOS-online/fitlab-core.tiny.cpu-only.git
cd fitlab-core.tiny.cpu-only
docker compose up

Once it’s running, you’ll be able to access the stack locally and start experimenting immediately.

Who this is for

This stack is meant for people who want local AI without the usual friction:

homelab builders
IT pros testing LLM workflows
developers building RAG pipelines
offline / air-gapped environments
MarvinOS development

If you’ve been curious about local LLMs but thought you needed a GPU to even begin, this is your on-ramp.

What’s next

This is the foundation.

From here, it’s easy to extend into:

better ingestion pipelines
larger models (if you have the CPU/RAM)
internal tool calling
multi-user setups
full MarvinOS integrations

If you spin it up and build something cool with it, I’d love to see what you do.

MarvinOS.Online

Search This Blog

Do you want to experiment with a local LLM but don’t have a GPU?

Do you want to experiment with a local LLM but don’t have a GPU?

What it is

Why CPU-only matters

What’s included

The models (fast + CPU-friendly)

LLM

Embeddings (RAG)

RAG without external dependencies

Built to survive rebuilds

Getting started (3 commands)

Who this is for

What’s next

Comments

Post a Comment

Popular posts from this blog

I Built a Docker Expert Because I Was Tired of Searching Docs

The Centralization Trap in AI

How I Built an AI App to Help Students Pass the New Jersey Driver’s Test — and Why Personalized Learning Matters