Skip to main content

Do you want to experiment with a local LLM but don’t have a GPU?

Do you want to experiment with a local LLM but don’t have a GPU?

Good news: you don’t need one.

I’ve been building MarvinOS Local AI Stack (CPU-Only); a fully self-hosted AI environment that runs entirely on regular CPU hardware, with no cloud dependency, and no “sign up for an API key” required just to get started.

If you’ve wanted to explore local LLMs, build a private knowledge base, or run AI tools in an offline lab environment, this stack is made for exactly that.


What it is

MarvinOS Local AI Stack (CPU-Only) is a Docker-based, rebuild-safe AI stack that gives you:

  • a local LLM

  • a full web chat UI

  • RAG (Retrieval-Augmented Generation) with a vector database

  • private web search grounding

  • local text-to-speech

  • HTTPS ingress with Nginx

…and it all runs locally, on CPU.


Why CPU-only matters

Not everyone has a spare RTX card lying around.

And even if you do, sometimes you want something that can run on:

  • a small server

  • a dev workstation

  • a home lab box

  • a “boring” corporate machine

  • an offline environment

This stack is designed to make local AI accessible, repeatable, and easy to rebuild without losing your data.


What’s included

This stack is made up of a few solid components that work well together:

  • Ollama (LLM runtime + embeddings)

  • Open WebUI (chat UI + RAG orchestration)

  • Qdrant (vector database)

  • SearXNG (private meta-search)

  • Nginx (HTTPS ingress + reverse proxy)

  • openai-edge-tts (local OpenAI-compatible TTS)

Everything is wired together using Docker Compose and designed to survive rebuilds.


The models (fast + CPU-friendly)

This stack is tuned around small models that run well without GPU acceleration:

LLM

  • llama3.2:1b

Embeddings (RAG)

  • qwen3-embedding:0.6b

That combination keeps things lightweight while still giving you real RAG, real document ingestion, and a real usable local chat environment.


RAG without external dependencies

One of the main goals here was to make RAG actually practical locally.

The flow looks like this:

  1. Upload documents in Open WebUI

  2. Embed locally using Ollama

  3. Store vectors in Qdrant

  4. Retrieve context per question

  5. Inject into the prompt

  6. Return responses with citations

No cloud. No external services. No “we send your documents somewhere.”


Built to survive rebuilds

The stack stores persistent data outside the repo, so you can rebuild containers, update configs, and iterate without wiping out:

  • models

  • documents

  • embeddings

  • WebUI state

  • configs


Getting started (3 commands)

If you already have Docker + Compose installed, it’s a simple clone-and-run setup:

git clone https://github.com/MarvinOS-online/fitlab-core.tiny.cpu-only.git cd fitlab-core.tiny.cpu-only docker compose up

Once it’s running, you’ll be able to access the stack locally and start experimenting immediately.


Who this is for

This stack is meant for people who want local AI without the usual friction:

  • homelab builders

  • IT pros testing LLM workflows

  • developers building RAG pipelines

  • offline / air-gapped environments

  • MarvinOS development

If you’ve been curious about local LLMs but thought you needed a GPU to even begin, this is your on-ramp.


What’s next

This is the foundation.

From here, it’s easy to extend into:

  • better ingestion pipelines

  • larger models (if you have the CPU/RAM)

  • internal tool calling

  • multi-user setups

  • full MarvinOS integrations

If you spin it up and build something cool with it, I’d love to see what you do.

Comments

Popular posts from this blog

I Built a Docker Expert Because I Was Tired of Searching Docs

  I Built a Docker Expert Because I Was Tired of Searching Docs I didn’t set out to build a “Docker expert.” I set out to stop breaking flow. If you’ve worked with Docker long enough, you know the feeling: you know the answer is in the docs, but you don’t know where . The information is correct, but fragmented. CLI flags in one place. Concepts in another. Edge cases buried three clicks deep. By the time you find what you need, the mental context is gone. So instead of reading Docker documentation, I asked a different question: What if I could  talk  to the Docker docs?  Not a chatbot that “knows Docker” in a vague, internet-trained way; but something grounded strictly in Docker’s own words; current, precise, and boringly correct. That’s what I built. The Idea: Treat Documentation as a Dataset Docker’s documentation is excellent. It’s also public, structured, and version-controlled on GitHub. That’s the key insight. Instead of scraping random websites or relying on a...

The Centralization Trap in AI

  The Centralization Trap in AI. AI is everywhere—and the debate is intense. Enthusiasts call it a force for progress: multiplying productivity, creating new industries, and amplifying human capabilities. Critics warn of job loss, erosion of autonomy, environmental strain, and even existential risks. The real issue isn’t AI itself. It’s who controls it—and who pays the costs. Centralization Always Externalizes Harm Most AI lives in massive, centralized platforms. These data centers draw enormous amounts of electricity and water. Cooling alone can consume millions of gallons annually. High demand drives grid expansion and raises energy costs for local communities, many of whom see no benefit. Platforms also control access, dictate usage, and extract data without meaningful user oversight. Profit and influence concentrate, while environmental, economic, and social costs are externalized. The danger isn’t the technology. It’s the architecture. Why Architecture Matters Centralized AI r...

MarvinOS Local AI Stack: Fully Self-Hosted AI Experimentation

   MarvinOS Local AI Stack: Fully Self-Hosted AI Experimentation I magine having a powerful, fully self-hosted AI experimentation platform that allows you to explore the possibilities of artificial intelligence without relying on cloud dependencies or compromising on security. That's exactly what we're excited to introduce today — the  MarvinOS Local AI Stack ! This innovative tool is designed for local AI experimentation, secure internal LLM access, GPU-accelerated inference, and even supports offline or air-gapped environments. It also includes  Retrieval-Augmented Generation (RAG)  with local document embeddings, enabling long-term knowledge bases without leaving your machine. Key Features & Benefits MarvinOS Local AI Stack offers a wide range of features that make it an ideal choice for those who value autonomy and control over their AI experiments: Fully local LLM inference : No cloud dependency means you're in complete control. RAG with Qdrant vector d...