Run OpenClaw locally for free with Ollama and zero API cost

Alexandru Stan

18/02/2026

Run OpenClaw locally with Ollama for zero API cost - Run OpenClaw locally for free with Ollama and zero API cost

Running OpenClaw on local models is one of those ideas that sounds like a hobby project until you do it once and realize how much “agent work” is basically glue tasks. Log summaries. JSON cleanup. Routing messages. Cron reports. All the stuff you don’t want to pay per token for.

Ollama is the easiest way to get there. It’s a local LLM runtime that downloads model weights and serves them over HTTP. Once weights are on disk you can run fully offline and your prompts do not leave your network.

This guide focuses on OpenClaw + Ollama. I’ll also cover what breaks most often, why the API mode matters, how to set up a hybrid local plus cloud flow and how to stop local models from being painfully verbose.

If you’re already hosting OpenClaw on a VPS and you want to keep the gateway online all the time, you’ll probably also want host OpenClaw securely on a VPS. For model choices and provider tradeoffs there’s also OpenClaw model choice: Claude vs OpenAI which is still useful even if your “provider” becomes Ollama.

OpenClaw and Ollama overview

OpenClaw can talk to Ollama in two main ways:

Native Ollama API using /api/chat on port 11434
OpenAI-compatible API using a /v1 endpoint for chat completions

Ollama documents the native chat endpoint at docs.ollama.com/api/chat. The default local base URL is http://localhost:11434 and their API introduction calls this out directly. If you’ve ever curled it you know it’s the least surprising thing in the world, which is why it’s great.

OpenClaw’s provider docs also include an Ollama quick start and the key detail that trips people up: OpenClaw expects an API key value to exist even though Ollama itself does not validate it. The OpenClaw Ollama provider page spells out that “any value works” and shows OLLAMA_API_KEY="ollama-local". You can read that here: docs.openclaw.ai/providers/ollama.

Why run OpenClaw locally with Ollama

Zero token costs

If you route half your agent calls into a local model, your bill drops. The exact percentage depends on how you use OpenClaw but the pattern is consistent: cheap repetitive operations dominate request count.

I like the boring framing: local models buy you predictable cost. It doesn’t matter if you had a chatty day or if your cron jobs spiked.

Privacy and data control

This is not just theoretical. In 2025 Harmonic Security published numbers showing sensitive corporate data regularly ends up in prompts and uploads to GenAI tools. Axios summarized their findings with hard percentages for prompts and uploaded files containing sensitive info. If your OpenClaw agent touches internal logs, invoices, tickets, customer emails, payroll exports or “oops that’s a private URL” material then a local model is a real risk reduction. You can read that Axios writeup here: workers are spilling secrets to chatbots.

Even if you trust your cloud provider, you still have policy questions: retention, audit, legal discovery and internal controls. Running local keeps the blast radius smaller.

Offline operation

Once weights are downloaded you can run without internet. That’s useful in labs, during outages and on machines that simply should not talk to third-party APIs.

Lower latency for small tasks

For short interactions a local GPU can respond in under a second for first token. The “no network round trip” effect is noticeable for agents that do lots of little tool steps.

What local models are good at and where they struggle

Let’s set expectations because this is where people get annoyed and blame OpenClaw when the real limitation is model capability.

Local models are strong at

tool calling for simple actions (run a command, read a file, parse output)
format conversions, JSON extraction and cleanup
short summaries of logs, tickets and chat threads
routing and classification (is this urgent, is this billing, is this abuse, is this support)
basic code generation in common languages

Local models struggle with

long multi-step reasoning that needs careful planning across many tool calls
high precision outputs when you need “exactly this format, no deviations” over long text
very large context windows if you don’t have VRAM headroom
some multilingual output quality depending on model and quantization

My practical takeaway is simple: use a hybrid setup. Let local do the cheap stuff. Use a cloud model for the “thinking hard” pieces and long-form writing. OpenClaw supports per-agent and per-task routing so you’re not locked into one provider.

If you want to understand why context and memory matter so much for agents, OpenClaw memory explained is relevant here. Local models feel worse when memory gets large and messy.

Hardware requirements for running Ollama well

People ask “what GPU do I need” and the honest answer is “it depends on which model you want to run and what context length you want”. Still, there are good rules of thumb.

VRAM matters more than raw GPU speed

8 GB VRAM is enough for many 7B models in useful quantizations
16 GB to 24 GB VRAM is where 14B to 32B models become comfortable
48 GB VRAM is where big models and large context start to feel realistic

Ollama has its own recommendations for context length defaults based on VRAM tiers. Their context-length page lists the default behavior and it explicitly calls out that agent tasks benefit from at least 64k tokens. Read it here: docs.ollama.com/context-length.

RAM and storage

RAM helps because weights are memory-mapped and because long contexts have a real CPU and RAM footprint. Storage is also not optional. Model files are big and you want SSD storage unless you enjoy slow cold starts.

One setting that helps memory usage

Ollama documents Flash Attention and the exact environment variable to enable it: OLLAMA_FLASH_ATTENTION=1. It can reduce memory usage as context grows. The Ollama FAQ calls that out here: docs.ollama.com/faq.

Step 1: Install Ollama

macOS

brew install ollama
ollama serve &

Linux (Debian or Ubuntu)

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama

Ollama serves locally on port 11434 by default. Their API introduction lists the default base URL and makes it easy to verify. See docs.ollama.com/api/introduction.

Quick verification

curl http://localhost:11434/api/tags

If it returns JSON then the server is running. If it returns an empty list then that just means you haven’t pulled any models yet.

Step 2: Download models with Ollama

Model choice is personal and a little political. You will see endless “best model” lists. I’m not doing that. I’m giving you a set that tends to behave well with tool calling and agent prompts.

Good starting points for local use

llama3.3 for general use
qwen2.5-coder:32b for code-heavy workflows
gpt-oss:20b as a decent middle ground for agents
deepseek-r1:32b if you specifically want a reasoning style model locally

OpenClaw’s Ollama provider docs use several of these exact IDs as examples. That’s useful because it keeps you aligned with what their config expects. See docs.openclaw.ai/providers/ollama.

Pull examples

ollama pull gpt-oss:20b
ollama pull llama3.3
ollama pull qwen2.5-coder:32b

Smoke test

ollama run llama3.3 "Why does the sky look blue?"

If that works then Ollama is fine and any OpenClaw issue is probably config, API mode or tool permissions.

Step 3: Connect OpenClaw to Ollama

You have three realistic setup paths. Pick one. Mixing them is how people end up in config purgatory.

Method A: Ollama launcher for OpenClaw

Ollama ships an OpenClaw integration page and it documents the launcher command directly. It also lists what it does: install OpenClaw via npm if needed, show a security notice, pick a model then configure and start the gateway. That’s all on their OpenClaw integration page: docs.ollama.com/integrations/openclaw.

ollama launch openclaw

Configuration only without starting the service:

ollama launch openclaw --config

If you like “one command and I’m done” this is the path.

Method B: OpenClaw auto-discovery using OLLAMA_API_KEY

OpenClaw’s provider docs show the simplest enablement pattern: set an env var and OpenClaw can use it as the provider key value. It does not need to be real. It just needs to exist.

export OLLAMA_API_KEY="ollama-local"

If you prefer config rather than environment variables:

openclaw config set models.providers.ollama.apiKey "ollama-local"

Then set your default agent model to an Ollama model ID:

agents: {
  defaults: {
    model: { primary: "ollama/gpt-oss:20b" }
  }
}

This path is nice because adding models is just ollama pull then openclaw models list.

Method C: Explicit provider config for full control

Use this if Ollama runs on another host, you need custom context settings or you want to define models that do not advertise tool support cleanly.

models: {
  providers: {
    ollama: {
      baseUrl: "http://127.0.0.1:11434",
      apiKey: "ollama-local",
      api: "ollama",
      models: [
        {
          id: "gpt-oss:20b",
          name: "GPT-OSS 20B",
          reasoning: false,
          input: ["text"],
          cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
          contextWindow: 8192,
          maxTokens: 81920
        }
      ]
    }
  }
}

Then set defaults and fallbacks:

agents: {
  defaults: {
    model: {
      primary: "ollama/gpt-oss:20b",
      fallbacks: ["ollama/llama3.3", "ollama/qwen2.5-coder:32b"]
    }
  }
}

Note: if you define an explicit provider entry you lose some of the “it just discovers models” convenience. That’s fine. Just do it deliberately.

API mode selection and the most common failure

This topic deserves its own section because it’s the reason most “it runs but replies are empty” bugs happen.

Native mode

Native mode uses Ollama’s /api/chat. Ollama documents that endpoint and it supports streaming responses. See docs.ollama.com/api/chat.

In OpenClaw config this is usually:

api: "ollama"
baseUrl: "http://127.0.0.1:11434"

OpenAI-compatible mode

Ollama also supports OpenAI compatibility on /v1. Their OpenAI compatibility page documents supported endpoints and request fields. See docs.ollama.com/api/openai-compatibility.

In OpenClaw that usually means:

api: "openai-completions"
baseUrl: "http://127.0.0.1:11434/v1"

One nuance: Ollama’s OpenAI compatibility includes a Responses API implementation. OpenClaw has its own “openai-responses” mode which is meant for OpenAI’s newer Responses API. People mix these up and end up with weird behavior. If you use OpenAI-compatible mode with Ollama then test it early with a simple chat and tool call before you build anything on top.

If you get empty replies or broken parsing then go back to native mode first. Native is the happy path for Ollama and it is the one their docs lead with.

Tool permissions for local models

Local models tend to be more literal about tools and permissions. If OpenClaw’s tool access is locked down then the model will keep talking about what it wants to do rather than doing it.

The OpenClaw gateway security docs explain how tool execution and sandboxing works and why approvals matter. Read the official doc here: docs.openclaw.ai/gateway/security.

A practical config example looks like this:

tools: {
  profile: "coding",
  allow: ["read", "exec", "write", "edit"],
  exec: {
    host: "gateway",
    ask: "off",
    security: "full"
  }
}

There’s a cost to this configuration. It is permissive. It assumes the machine is dedicated to OpenClaw and that you control access to the gateway and messaging channels. If that’s not your situation then tighten it.

If you want a deeper hardening checklist use OpenClaw security best practices.

Context length setup for Ollama and OpenClaw

Agents eat context. System prompt, skills, memory, chat history and tool outputs all live in the same window. If your context is small your agent becomes forgetful and weird.

Ollama’s context-length page gives two useful pieces of guidance:

it defaults context length based on VRAM
agent tasks benefit from at least 64k tokens

You can set server-side context length via an environment variable before starting Ollama. The exact variable name is documented in Ollama’s context-length docs. See docs.ollama.com/context-length.

Example for a systemd environment:

OLLAMA_CONTEXT_LENGTH=16384
OLLAMA_FLASH_ATTENTION=1

Do not blindly crank context to huge numbers. VRAM usage goes up and performance can drop. If you want “fast daily operations” you might prefer a smaller context for those agents and a larger context only for deep analysis agents.

Hybrid local plus cloud model routing

This is where OpenClaw becomes genuinely useful instead of just “local chat UI”. You can run local models for cheap operations and still keep a cloud model available for the hard bits.

Per-agent override

agents: {
  defaults: {
    model: { primary: "ollama/gpt-oss:20b" }
  },
  overrides: {
    "deep-analysis": {
      model: { primary: "anthropic/claude-sonnet-4-20250514" }
    }
  }
}

Fallback chain with a cloud escape hatch

agents: {
  defaults: {
    model: {
      primary: "ollama/llama3.3",
      fallbacks: ["ollama/qwen2.5-coder:32b", "anthropic/claude-sonnet-4-20250514"]
    }
  }
}

Why I like fallbacks: you can keep your day-to-day free and still avoid a full outage when the local model hits a wall.

If you are already using a proxy layer for providers then OpenClaw API proxy setup is worth reading. A proxy can normalize endpoints when you want to swap between local backends like Ollama and something OpenAI-compatible.

Reducing verbosity for local models

Local models often over-explain and they love dumping JSON. You can fight that in two places: SOUL.md and the model system prompt.

SOUL.md brevity rules

## Brevity rules

Be concise.
Do not dump skill documentation.
Do not print raw JSON responses.
Do not explain what you will do in detail.
When a task succeeds, confirm in one short sentence.

This won’t make every model suddenly terse but it helps more than people expect.

Custom Modelfile for tool behavior

Some models will describe tools instead of using them or they ask permission for every action. A custom Ollama Modelfile lets you bake in “use tools directly” behavior.

Ollama supports creating models from a Modelfile and their docs cover this workflow. If you haven’t used it before start with their main docs hub and search for “Modelfile”. See docs.ollama.com.

Example Modelfile template:

FROM qwen2.5-coder:32b

SYSTEM """You are a helpful assistant with access to tools.

Tool behavior:
- Use available tools when needed without asking for permission
- Do not describe the tool call in advance
- Summarize results instead of outputting raw JSON
- If required input is missing ask a direct question

Keep answers short unless the user asks for deep detail."""

Then build it:

ollama create qwen-agentic -f qwen-agentic.Modelfile

Important detail: don’t override a model’s tool calling template unless you know what you’re doing. You can break tool formatting in ways that look like “OpenClaw is broken” when it’s actually your prompt.

Running Ollama on a separate machine

You don’t have to run Ollama on the same box as OpenClaw. A common pattern is:

OpenClaw gateway runs on a small always-on server
Ollama runs on a LAN GPU box

In config that’s basically swapping the baseUrl:

models: {
  providers: {
    ollama: {
      baseUrl: "http://192.168.1.50:11434",
      apiKey: "ollama-local",
      api: "ollama"
    }
  }
}

If you do this, treat your LAN like a hostile environment anyway. If someone can hit your Ollama endpoint they can send prompts to it. Add network controls and consider TLS termination if it crosses untrusted networks.

Docker Compose example for Ollama and OpenClaw

This is a minimal sketch that people adapt. The point is not “here is the one true compose file” but “these are the knobs you actually end up setting”.

services:
  ollama:
    image: ollama/ollama
    environment:
      - OLLAMA_CONTEXT_LENGTH=16384
      - OLLAMA_FLASH_ATTENTION=1
    volumes:
      - ollama-data:/root/.ollama
    ports:
      - "11434:11434"

  openclaw:
    image: openclaw/gateway
    environment:
      - OLLAMA_API_KEY=ollama-local
    depends_on:
      - ollama

If you’re doing GPU passthrough you’ll add your runtime and device mappings based on your host. That part differs too much between setups to pretend there’s one snippet that always works.

Common troubleshooting

OpenClaw can reach Ollama but responses are empty

Start by checking API mode. If you are using OpenAI-compatible mode then temporarily switch to native api: "ollama" with baseUrl pointing at http://host:11434. Confirm a basic chat works. Then confirm tool calls.

Ollama not detected

Confirm Ollama is running then confirm OpenClaw sees a provider key value. The OpenClaw Ollama provider doc shows the env var and the config command. See docs.openclaw.ai/providers/ollama.

No models available in OpenClaw

Run:

ollama list
openclaw models list

If models exist in Ollama but don’t appear in OpenClaw then use explicit provider config and define the model list manually. This is annoying but it gets you unstuck.

Tool calls fail or the model “talks about tools” only

Check tool permissions and consider a Modelfile prompt adjustment. Also confirm you installed the skill you expect. If you’re fuzzy on how skills are installed and exposed, OpenClaw skills guide is the right refresher.

Gateway keeps asking for approval even with ask off

This can happen due to security policy and session state. If it starts happening mid-session, a gateway restart often clears it. If you want to understand the security model and the tradeoffs, read docs.openclaw.ai/gateway/security and compare it to your current config.

Verification checklist

If you want a clean “it works” moment, do these in order:

# 1) Ollama running
curl http://localhost:11434/api/tags

# 2) Pull a model
ollama pull llama3.3

# 3) OpenClaw can see models
openclaw models list

# 4) Start gateway
openclaw gateway start

Then in your OpenClaw chat or TUI ask:

“What time is it?” (should run a time lookup or command depending on your tool setup)
“List files in the current directory” (should use exec if allowed)
“Summarize this JSON” then paste a small JSON object

If those work you’re operational.

Other local backends you can use with OpenClaw

Ollama is not the only option. If you want throughput or you’re running a GPU server like a service you might look at:

vLLM for high-throughput OpenAI-compatible serving
llama.cpp for minimal dependencies and small deployments
LM Studio for a GUI-first local setup that also exposes a /v1 endpoint

OpenClaw can connect to OpenAI-compatible backends using its “openai-completions” provider mode. Our own overview of free model setups touches this and includes example patterns: free AI models for OpenClaw.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

Billing Cycle

1 GB RAM VPS

18.00 RON Save 25 %

13.49 _RON Monthly

1 vCPU AMD EPYC
30 GB NVMe storage
✔Unmetered bandwidth
✔ IPv4 & IPv6 included IPv6 support is currently unavailable in France, Finland or the Netherlands.
✔1 Gbps network
✔Firewall management
✔Free server monitoring

Run OpenClaw locally for free with Ollama and zero API cost

OpenClaw and Ollama overview

Why run OpenClaw locally with Ollama

Zero token costs

Privacy and data control

Offline operation

Lower latency for small tasks

What local models are good at and where they struggle

Local models are strong at

Local models struggle with

Hardware requirements for running Ollama well

VRAM matters more than raw GPU speed

RAM and storage

One setting that helps memory usage

Step 1: Install Ollama

macOS

Linux (Debian or Ubuntu)

Quick verification

Step 2: Download models with Ollama

Good starting points for local use

Pull examples

Smoke test

Step 3: Connect OpenClaw to Ollama

Method A: Ollama launcher for OpenClaw

Method B: OpenClaw auto-discovery using OLLAMA_API_KEY

Method C: Explicit provider config for full control

API mode selection and the most common failure

Native mode

OpenAI-compatible mode

Tool permissions for local models

Context length setup for Ollama and OpenClaw

Hybrid local plus cloud model routing

Per-agent override

Fallback chain with a cloud escape hatch

Reducing verbosity for local models

SOUL.md brevity rules

Custom Modelfile for tool behavior

Running Ollama on a separate machine

Docker Compose example for Ollama and OpenClaw

Common troubleshooting

OpenClaw can reach Ollama but responses are empty

Ollama not detected

No models available in OpenClaw

Tool calls fail or the model “talks about tools” only

Gateway keeps asking for approval even with ask off

Verification checklist

Other local backends you can use with OpenClaw

Your idea deserves better hosting

1 GB RAM VPS

2 GB RAM VPS

4 GB RAM VPS

6 GB RAM VPS

AMD EPYC VPS.P1

AMD EPYC VPS.P2

AMD EPYC VPS.P3

AMD EPYC VPS.P4

AMD EPYC VPS.P5

AMD EPYC VPS.P6

AMD EPYC VPS.P7

EPYC Genoa VPS.G1

EPYC Genoa VPS.G2

EPYC Genoa VPS.G3

EPYC Genoa VPS.G4

EPYC Genoa VPS.G6

EPYC Genoa VPS.G7

1 vCPU AMD Ryzen 9

2 vCPU AMD Ryzen 9

4 vCPU AMD Ryzen 9

8 vCPU AMD Ryzen 9

FAQ

How do I connect OpenClaw to Ollama?

Which API mode should I use for Ollama?

Why does OpenClaw ask me for an Ollama API key when Ollama does not need one?

How do I run OpenClaw offline?

How do I stop local models from being too verbose in OpenClaw?

Automate faster, for less

Products

App hosting solutions

Features

Resources