Running OpenClaw on local models is one of those ideas that sounds like a hobby project until you do it once and realize how much “agent work” is basically glue tasks. Log summaries. JSON cleanup. Routing messages. Cron reports. All the stuff you don’t want to pay per token for.
Ollama is the easiest way to get there. It’s a local LLM runtime that downloads model weights and serves them over HTTP. Once weights are on disk you can run fully offline and your prompts do not leave your network.
This guide focuses on OpenClaw + Ollama. I’ll also cover what breaks most often, why the API mode matters, how to set up a hybrid local plus cloud flow and how to stop local models from being painfully verbose.
If you’re already hosting OpenClaw on a VPS and you want to keep the gateway online all the time, you’ll probably also want host OpenClaw securely on a VPS. For model choices and provider tradeoffs there’s also OpenClaw model choice: Claude vs OpenAI which is still useful even if your “provider” becomes Ollama.
OpenClaw and Ollama overview
OpenClaw can talk to Ollama in two main ways:
- Native Ollama API using
/api/chaton port11434 - OpenAI-compatible API using a
/v1endpoint for chat completions
Ollama documents the native chat endpoint at docs.ollama.com/api/chat. The default local base URL is http://localhost:11434 and their API introduction calls this out directly. If you’ve ever curled it you know it’s the least surprising thing in the world, which is why it’s great.
OpenClaw’s provider docs also include an Ollama quick start and the key detail that trips people up: OpenClaw expects an API key value to exist even though Ollama itself does not validate it. The OpenClaw Ollama provider page spells out that “any value works” and shows OLLAMA_API_KEY="ollama-local". You can read that here: docs.openclaw.ai/providers/ollama.
Why run OpenClaw locally with Ollama
Zero token costs
If you route half your agent calls into a local model, your bill drops. The exact percentage depends on how you use OpenClaw but the pattern is consistent: cheap repetitive operations dominate request count.
I like the boring framing: local models buy you predictable cost. It doesn’t matter if you had a chatty day or if your cron jobs spiked.
Privacy and data control
This is not just theoretical. In 2025 Harmonic Security published numbers showing sensitive corporate data regularly ends up in prompts and uploads to GenAI tools. Axios summarized their findings with hard percentages for prompts and uploaded files containing sensitive info. If your OpenClaw agent touches internal logs, invoices, tickets, customer emails, payroll exports or “oops that’s a private URL” material then a local model is a real risk reduction. You can read that Axios writeup here: workers are spilling secrets to chatbots.
Even if you trust your cloud provider, you still have policy questions: retention, audit, legal discovery and internal controls. Running local keeps the blast radius smaller.
Offline operation
Once weights are downloaded you can run without internet. That’s useful in labs, during outages and on machines that simply should not talk to third-party APIs.
Lower latency for small tasks
For short interactions a local GPU can respond in under a second for first token. The “no network round trip” effect is noticeable for agents that do lots of little tool steps.
What local models are good at and where they struggle
Let’s set expectations because this is where people get annoyed and blame OpenClaw when the real limitation is model capability.
Local models are strong at
- tool calling for simple actions (run a command, read a file, parse output)
- format conversions, JSON extraction and cleanup
- short summaries of logs, tickets and chat threads
- routing and classification (is this urgent, is this billing, is this abuse, is this support)
- basic code generation in common languages
Local models struggle with
- long multi-step reasoning that needs careful planning across many tool calls
- high precision outputs when you need “exactly this format, no deviations” over long text
- very large context windows if you don’t have VRAM headroom
- some multilingual output quality depending on model and quantization
My practical takeaway is simple: use a hybrid setup. Let local do the cheap stuff. Use a cloud model for the “thinking hard” pieces and long-form writing. OpenClaw supports per-agent and per-task routing so you’re not locked into one provider.
If you want to understand why context and memory matter so much for agents, OpenClaw memory explained is relevant here. Local models feel worse when memory gets large and messy.
Hardware requirements for running Ollama well
People ask “what GPU do I need” and the honest answer is “it depends on which model you want to run and what context length you want”. Still, there are good rules of thumb.
VRAM matters more than raw GPU speed
- 8 GB VRAM is enough for many 7B models in useful quantizations
- 16 GB to 24 GB VRAM is where 14B to 32B models become comfortable
- 48 GB VRAM is where big models and large context start to feel realistic
Ollama has its own recommendations for context length defaults based on VRAM tiers. Their context-length page lists the default behavior and it explicitly calls out that agent tasks benefit from at least 64k tokens. Read it here: docs.ollama.com/context-length.
RAM and storage
RAM helps because weights are memory-mapped and because long contexts have a real CPU and RAM footprint. Storage is also not optional. Model files are big and you want SSD storage unless you enjoy slow cold starts.
One setting that helps memory usage
Ollama documents Flash Attention and the exact environment variable to enable it: OLLAMA_FLASH_ATTENTION=1. It can reduce memory usage as context grows. The Ollama FAQ calls that out here: docs.ollama.com/faq.
Step 1: Install Ollama
macOS
brew install ollama
ollama serve &
Linux (Debian or Ubuntu)
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
Ollama serves locally on port 11434 by default. Their API introduction lists the default base URL and makes it easy to verify. See docs.ollama.com/api/introduction.
Quick verification
curl http://localhost:11434/api/tags
If it returns JSON then the server is running. If it returns an empty list then that just means you haven’t pulled any models yet.
Step 2: Download models with Ollama
Model choice is personal and a little political. You will see endless “best model” lists. I’m not doing that. I’m giving you a set that tends to behave well with tool calling and agent prompts.
Good starting points for local use
- llama3.3 for general use
- qwen2.5-coder:32b for code-heavy workflows
- gpt-oss:20b as a decent middle ground for agents
- deepseek-r1:32b if you specifically want a reasoning style model locally
OpenClaw’s Ollama provider docs use several of these exact IDs as examples. That’s useful because it keeps you aligned with what their config expects. See docs.openclaw.ai/providers/ollama.
Pull examples
ollama pull gpt-oss:20b
ollama pull llama3.3
ollama pull qwen2.5-coder:32b
Smoke test
ollama run llama3.3 "Why does the sky look blue?"
If that works then Ollama is fine and any OpenClaw issue is probably config, API mode or tool permissions.
Step 3: Connect OpenClaw to Ollama
You have three realistic setup paths. Pick one. Mixing them is how people end up in config purgatory.
Method A: Ollama launcher for OpenClaw
Ollama ships an OpenClaw integration page and it documents the launcher command directly. It also lists what it does: install OpenClaw via npm if needed, show a security notice, pick a model then configure and start the gateway. That’s all on their OpenClaw integration page: docs.ollama.com/integrations/openclaw.
ollama launch openclaw
Configuration only without starting the service:
ollama launch openclaw --config
If you like “one command and I’m done” this is the path.
Method B: OpenClaw auto-discovery using OLLAMA_API_KEY
OpenClaw’s provider docs show the simplest enablement pattern: set an env var and OpenClaw can use it as the provider key value. It does not need to be real. It just needs to exist.
export OLLAMA_API_KEY="ollama-local"
If you prefer config rather than environment variables:
openclaw config set models.providers.ollama.apiKey "ollama-local"
Then set your default agent model to an Ollama model ID:
agents: {
defaults: {
model: { primary: "ollama/gpt-oss:20b" }
}
}
This path is nice because adding models is just ollama pull then openclaw models list.
Method C: Explicit provider config for full control
Use this if Ollama runs on another host, you need custom context settings or you want to define models that do not advertise tool support cleanly.
models: {
providers: {
ollama: {
baseUrl: "http://127.0.0.1:11434",
apiKey: "ollama-local",
api: "ollama",
models: [
{
id: "gpt-oss:20b",
name: "GPT-OSS 20B",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 8192,
maxTokens: 81920
}
]
}
}
}
Then set defaults and fallbacks:
agents: {
defaults: {
model: {
primary: "ollama/gpt-oss:20b",
fallbacks: ["ollama/llama3.3", "ollama/qwen2.5-coder:32b"]
}
}
}
Note: if you define an explicit provider entry you lose some of the “it just discovers models” convenience. That’s fine. Just do it deliberately.
API mode selection and the most common failure
This topic deserves its own section because it’s the reason most “it runs but replies are empty” bugs happen.
Native mode
Native mode uses Ollama’s /api/chat. Ollama documents that endpoint and it supports streaming responses. See docs.ollama.com/api/chat.
In OpenClaw config this is usually:
api: "ollama"
baseUrl: "http://127.0.0.1:11434"
OpenAI-compatible mode
Ollama also supports OpenAI compatibility on /v1. Their OpenAI compatibility page documents supported endpoints and request fields. See docs.ollama.com/api/openai-compatibility.
In OpenClaw that usually means:
api: "openai-completions"
baseUrl: "http://127.0.0.1:11434/v1"
One nuance: Ollama’s OpenAI compatibility includes a Responses API implementation. OpenClaw has its own “openai-responses” mode which is meant for OpenAI’s newer Responses API. People mix these up and end up with weird behavior. If you use OpenAI-compatible mode with Ollama then test it early with a simple chat and tool call before you build anything on top.
If you get empty replies or broken parsing then go back to native mode first. Native is the happy path for Ollama and it is the one their docs lead with.
Tool permissions for local models
Local models tend to be more literal about tools and permissions. If OpenClaw’s tool access is locked down then the model will keep talking about what it wants to do rather than doing it.
The OpenClaw gateway security docs explain how tool execution and sandboxing works and why approvals matter. Read the official doc here: docs.openclaw.ai/gateway/security.
A practical config example looks like this:
tools: {
profile: "coding",
allow: ["read", "exec", "write", "edit"],
exec: {
host: "gateway",
ask: "off",
security: "full"
}
}
There’s a cost to this configuration. It is permissive. It assumes the machine is dedicated to OpenClaw and that you control access to the gateway and messaging channels. If that’s not your situation then tighten it.
If you want a deeper hardening checklist use OpenClaw security best practices.
Context length setup for Ollama and OpenClaw
Agents eat context. System prompt, skills, memory, chat history and tool outputs all live in the same window. If your context is small your agent becomes forgetful and weird.
Ollama’s context-length page gives two useful pieces of guidance:
- it defaults context length based on VRAM
- agent tasks benefit from at least 64k tokens
You can set server-side context length via an environment variable before starting Ollama. The exact variable name is documented in Ollama’s context-length docs. See docs.ollama.com/context-length.
Example for a systemd environment:
OLLAMA_CONTEXT_LENGTH=16384
OLLAMA_FLASH_ATTENTION=1
Do not blindly crank context to huge numbers. VRAM usage goes up and performance can drop. If you want “fast daily operations” you might prefer a smaller context for those agents and a larger context only for deep analysis agents.
Hybrid local plus cloud model routing
This is where OpenClaw becomes genuinely useful instead of just “local chat UI”. You can run local models for cheap operations and still keep a cloud model available for the hard bits.
Per-agent override
agents: {
defaults: {
model: { primary: "ollama/gpt-oss:20b" }
},
overrides: {
"deep-analysis": {
model: { primary: "anthropic/claude-sonnet-4-20250514" }
}
}
}
Fallback chain with a cloud escape hatch
agents: {
defaults: {
model: {
primary: "ollama/llama3.3",
fallbacks: ["ollama/qwen2.5-coder:32b", "anthropic/claude-sonnet-4-20250514"]
}
}
}
Why I like fallbacks: you can keep your day-to-day free and still avoid a full outage when the local model hits a wall.
If you are already using a proxy layer for providers then OpenClaw API proxy setup is worth reading. A proxy can normalize endpoints when you want to swap between local backends like Ollama and something OpenAI-compatible.
Reducing verbosity for local models
Local models often over-explain and they love dumping JSON. You can fight that in two places: SOUL.md and the model system prompt.
SOUL.md brevity rules
## Brevity rules
Be concise.
Do not dump skill documentation.
Do not print raw JSON responses.
Do not explain what you will do in detail.
When a task succeeds, confirm in one short sentence.
This won’t make every model suddenly terse but it helps more than people expect.
Custom Modelfile for tool behavior
Some models will describe tools instead of using them or they ask permission for every action. A custom Ollama Modelfile lets you bake in “use tools directly” behavior.
Ollama supports creating models from a Modelfile and their docs cover this workflow. If you haven’t used it before start with their main docs hub and search for “Modelfile”. See docs.ollama.com.
Example Modelfile template:
FROM qwen2.5-coder:32b
SYSTEM """You are a helpful assistant with access to tools.
Tool behavior:
- Use available tools when needed without asking for permission
- Do not describe the tool call in advance
- Summarize results instead of outputting raw JSON
- If required input is missing ask a direct question
Keep answers short unless the user asks for deep detail."""
Then build it:
ollama create qwen-agentic -f qwen-agentic.Modelfile
Important detail: don’t override a model’s tool calling template unless you know what you’re doing. You can break tool formatting in ways that look like “OpenClaw is broken” when it’s actually your prompt.
Running Ollama on a separate machine
You don’t have to run Ollama on the same box as OpenClaw. A common pattern is:
- OpenClaw gateway runs on a small always-on server
- Ollama runs on a LAN GPU box
In config that’s basically swapping the baseUrl:
models: {
providers: {
ollama: {
baseUrl: "http://192.168.1.50:11434",
apiKey: "ollama-local",
api: "ollama"
}
}
}
If you do this, treat your LAN like a hostile environment anyway. If someone can hit your Ollama endpoint they can send prompts to it. Add network controls and consider TLS termination if it crosses untrusted networks.
Docker Compose example for Ollama and OpenClaw
This is a minimal sketch that people adapt. The point is not “here is the one true compose file” but “these are the knobs you actually end up setting”.
services:
ollama:
image: ollama/ollama
environment:
- OLLAMA_CONTEXT_LENGTH=16384
- OLLAMA_FLASH_ATTENTION=1
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
openclaw:
image: openclaw/gateway
environment:
- OLLAMA_API_KEY=ollama-local
depends_on:
- ollama
If you’re doing GPU passthrough you’ll add your runtime and device mappings based on your host. That part differs too much between setups to pretend there’s one snippet that always works.
Common troubleshooting
OpenClaw can reach Ollama but responses are empty
Start by checking API mode. If you are using OpenAI-compatible mode then temporarily switch to native api: "ollama" with baseUrl pointing at http://host:11434. Confirm a basic chat works. Then confirm tool calls.
Ollama not detected
Confirm Ollama is running then confirm OpenClaw sees a provider key value. The OpenClaw Ollama provider doc shows the env var and the config command. See docs.openclaw.ai/providers/ollama.
No models available in OpenClaw
Run:
ollama list
openclaw models list
If models exist in Ollama but don’t appear in OpenClaw then use explicit provider config and define the model list manually. This is annoying but it gets you unstuck.
Tool calls fail or the model “talks about tools” only
Check tool permissions and consider a Modelfile prompt adjustment. Also confirm you installed the skill you expect. If you’re fuzzy on how skills are installed and exposed, OpenClaw skills guide is the right refresher.
Gateway keeps asking for approval even with ask off
This can happen due to security policy and session state. If it starts happening mid-session, a gateway restart often clears it. If you want to understand the security model and the tradeoffs, read docs.openclaw.ai/gateway/security and compare it to your current config.
Verification checklist
If you want a clean “it works” moment, do these in order:
# 1) Ollama running
curl http://localhost:11434/api/tags
# 2) Pull a model
ollama pull llama3.3
# 3) OpenClaw can see models
openclaw models list
# 4) Start gateway
openclaw gateway start
Then in your OpenClaw chat or TUI ask:
- “What time is it?” (should run a time lookup or command depending on your tool setup)
- “List files in the current directory” (should use exec if allowed)
- “Summarize this JSON” then paste a small JSON object
If those work you’re operational.
Other local backends you can use with OpenClaw
Ollama is not the only option. If you want throughput or you’re running a GPU server like a service you might look at:
- vLLM for high-throughput OpenAI-compatible serving
- llama.cpp for minimal dependencies and small deployments
- LM Studio for a GUI-first local setup that also exposes a
/v1endpoint
OpenClaw can connect to OpenAI-compatible backends using its “openai-completions” provider mode. Our own overview of free model setups touches this and includes example patterns: free AI models for OpenClaw.

