Skip to main content

Do you want to experiment with a local LLM but don’t have a GPU?

Do you want to experiment with a local LLM but don’t have a GPU?

Good news: you don’t need one.

I’ve been building MarvinOS Local AI Stack (CPU-Only); a fully self-hosted AI environment that runs entirely on regular CPU hardware, with no cloud dependency, and no “sign up for an API key” required just to get started.

If you’ve wanted to explore local LLMs, build a private knowledge base, or run AI tools in an offline lab environment, this stack is made for exactly that.


What it is

MarvinOS Local AI Stack (CPU-Only) is a Docker-based, rebuild-safe AI stack that gives you:

  • a local LLM

  • a full web chat UI

  • RAG (Retrieval-Augmented Generation) with a vector database

  • private web search grounding

  • local text-to-speech

  • HTTPS ingress with Nginx

…and it all runs locally, on CPU.


Why CPU-only matters

Not everyone has a spare RTX card lying around.

And even if you do, sometimes you want something that can run on:

  • a small server

  • a dev workstation

  • a home lab box

  • a “boring” corporate machine

  • an offline environment

This stack is designed to make local AI accessible, repeatable, and easy to rebuild without losing your data.


What’s included

This stack is made up of a few solid components that work well together:

  • Ollama (LLM runtime + embeddings)

  • Open WebUI (chat UI + RAG orchestration)

  • Qdrant (vector database)

  • SearXNG (private meta-search)

  • Nginx (HTTPS ingress + reverse proxy)

  • openai-edge-tts (local OpenAI-compatible TTS)

Everything is wired together using Docker Compose and designed to survive rebuilds.


The models (fast + CPU-friendly)

This stack is tuned around small models that run well without GPU acceleration:

LLM

  • llama3.2:1b

Embeddings (RAG)

  • qwen3-embedding:0.6b

That combination keeps things lightweight while still giving you real RAG, real document ingestion, and a real usable local chat environment.


RAG without external dependencies

One of the main goals here was to make RAG actually practical locally.

The flow looks like this:

  1. Upload documents in Open WebUI

  2. Embed locally using Ollama

  3. Store vectors in Qdrant

  4. Retrieve context per question

  5. Inject into the prompt

  6. Return responses with citations

No cloud. No external services. No “we send your documents somewhere.”


Built to survive rebuilds

The stack stores persistent data outside the repo, so you can rebuild containers, update configs, and iterate without wiping out:

  • models

  • documents

  • embeddings

  • WebUI state

  • configs


Getting started (3 commands)

If you already have Docker + Compose installed, it’s a simple clone-and-run setup:

git clone https://github.com/MarvinOS-online/fitlab-core.tiny.cpu-only.git cd fitlab-core.tiny.cpu-only docker compose up

Once it’s running, you’ll be able to access the stack locally and start experimenting immediately.


Who this is for

This stack is meant for people who want local AI without the usual friction:

  • homelab builders

  • IT pros testing LLM workflows

  • developers building RAG pipelines

  • offline / air-gapped environments

  • MarvinOS development

If you’ve been curious about local LLMs but thought you needed a GPU to even begin, this is your on-ramp.


What’s next

This is the foundation.

From here, it’s easy to extend into:

  • better ingestion pipelines

  • larger models (if you have the CPU/RAM)

  • internal tool calling

  • multi-user setups

  • full MarvinOS integrations

If you spin it up and build something cool with it, I’d love to see what you do.

Comments

Popular posts from this blog

I Built a Docker Expert Because I Was Tired of Searching Docs

  I Built a Docker Expert Because I Was Tired of Searching Docs I didn’t set out to build a “Docker expert.” I set out to stop breaking flow. If you’ve worked with Docker long enough, you know the feeling: you know the answer is in the docs, but you don’t know where . The information is correct, but fragmented. CLI flags in one place. Concepts in another. Edge cases buried three clicks deep. By the time you find what you need, the mental context is gone. So instead of reading Docker documentation, I asked a different question: What if I could  talk  to the Docker docs?  Not a chatbot that “knows Docker” in a vague, internet-trained way; but something grounded strictly in Docker’s own words; current, precise, and boringly correct. That’s what I built. The Idea: Treat Documentation as a Dataset Docker’s documentation is excellent. It’s also public, structured, and version-controlled on GitHub. That’s the key insight. Instead of scraping random websites or relying on a...

The Centralization Trap in AI

  The Centralization Trap in AI. AI is everywhere—and the debate is intense. Enthusiasts call it a force for progress: multiplying productivity, creating new industries, and amplifying human capabilities. Critics warn of job loss, erosion of autonomy, environmental strain, and even existential risks. The real issue isn’t AI itself. It’s who controls it—and who pays the costs. Centralization Always Externalizes Harm Most AI lives in massive, centralized platforms. These data centers draw enormous amounts of electricity and water. Cooling alone can consume millions of gallons annually. High demand drives grid expansion and raises energy costs for local communities, many of whom see no benefit. Platforms also control access, dictate usage, and extract data without meaningful user oversight. Profit and influence concentrate, while environmental, economic, and social costs are externalized. The danger isn’t the technology. It’s the architecture. Why Architecture Matters Centralized AI r...

How I Built an AI App to Help Students Pass the New Jersey Driver’s Test — and Why Personalized Learning Matters

  How I Built an AI App to Help Students Pass the New Jersey Driver’s Test — and Why Personalized Learning Matters Studying for the New Jersey Driver’s Test can be stressful. The manual is long, the rules are specific, and traditional study methods usually take a one-size-fits-all approach: read the book, take the same practice tests as everyone else, and hope it sticks. I built a Dify-powered AI app to change that. The goal was simple: help people study smarter, not harder — by using AI to adapt to each individual student , instead of forcing everyone through the same cookie-cutter experience. 👉 You can try the app here: https://marvinos.online:8093/chat/VkJxPJhtWC6vQc0W The Problem with Traditional Test Prep Most driver’s test prep tools work the same way: Static practice questions Generic explanations No awareness of what you already know or struggle with But not all students are the same. Some people struggle with: Road signs Right-of-way rules DUI and point system questions ...