Run ZeroClaw with Ollama for free local AI

Alex

03/03/2026

Run ZeroClaw with Ollama for free local AI

Every message you send through a cloud API costs money. For Claude or GPT-4o it's fractions of a cent per message, but it adds up if you're running automations, cron jobs or multi-agent workflows that generate hundreds of requests a day. The alternative is running models locally on the same VPS using Ollama, which serves open-source LLMs through a clean local API that ZeroClaw speaks natively.

There's a tradeoff here and it's worth being upfront about it. Local models are free to run but they're not as capable as Claude Sonnet or GPT-4o. For simple tasks like summarizing text, drafting emails, answering questions about documents and managing your calendar, a local 8B parameter model handles things fine. For complex multi-step reasoning or code generation, you'll notice the gap. The right approach for most people is to start local, see where it falls short and upgrade to a cloud provider only for the tasks that need it.

VPS requirements for local models

ZeroClaw itself is negligible here. It uses about 4 MB of RAM. The model is what eats resources. Here's a rough guide based on the models people actually run:

4 GB RAM VPS — You can run smaller models like Phi-3 Mini (3.8B parameters) or Qwen2.5:3b. Expect slower responses and limited context windows. It works, but you're at the edge of comfortable.

8 GB RAM VPS — The sweet spot for most users. Llama 3.1:8b runs well here with room to spare for the OS and ZeroClaw. Response times are reasonable and the model is capable enough for real work.

16 GB RAM or more — Opens up larger models like Llama 3.1:70b (quantized) or Mixtral. If you need something that approaches cloud-tier quality, this is where you start getting it, though inference will still be slower than an API call to a data center full of GPUs.

Install Ollama on your VPS

SSH into your server and run the official install script:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service that starts automatically. Verify it's running:

systemctl status ollama

Now pull the model you want. For an 8 GB VPS, start with Llama 3.1:8b:

ollama pull llama3.1:8b

The download is around 4.7 GB. Once it finishes you can test the model directly:

ollama run llama3.1:8b "Explain what a reverse proxy does in two sentences"

If you get a coherent answer, Ollama is working. If your VPS runs out of memory during inference, you need a smaller model or a bigger VPS. There isn't really a middle ground with local LLMs, the model either fits in RAM or it doesn't.

Point ZeroClaw at Ollama

Open ZeroClaw's config file:

nano ~/.zeroclaw/config.toml

Set the provider to Ollama and specify the model. The critical detail here is that the model name in config.toml must exactly match the tag from ollama list:

default_provider = "ollama"
default_model = "llama3.1:8b"

By default ZeroClaw expects Ollama at http://localhost:11434, which is where Ollama listens after a standard install. If you've changed the port or you're running Ollama on a different machine, add the endpoint explicitly:

[providers.ollama]
url = "http://localhost:11434"

Restart ZeroClaw to pick up the changes:

zeroclaw service restart

Test the full stack

Run a quick message through ZeroClaw to confirm the Ollama integration is working end to end:

zeroclaw agent -m "List three things you can help me with"

You should see a response generated by your local model. It'll be a bit slower than a cloud API, usually 2-8 seconds for a short reply on an 8B model with a CPU-only VPS. That's normal. If you get no response or an error, run diagnostics:

zeroclaw doctor

The most common problem is a model name mismatch. Run ollama list and compare the model tag against what's in your config.toml. They need to be identical, including the tag after the colon.

Picking the right model for your hardware

Not all 8B models perform the same. Based on what people actually report running with ZeroClaw on modest VPS hardware, here are the most reliable choices:

Llama 3.1:8b — The default recommendation. Good general reasoning, solid instruction following. Works well for chat, summarization and basic coding tasks. Needs about 5-6 GB of free RAM.

Qwen 2.5:7b — Slightly smaller memory footprint than Llama 3.1. Strong on multilingual tasks if your conversations aren't English-only. Performance is close to Llama 3.1 for most use cases.

Phi-3 Mini — Microsoft's compact model at 3.8B parameters. Fits on 4 GB VPS plans. Noticeably less capable than the 7-8B options but responsive and good enough for simple Q&A, reminders and summaries.

Mistral 7B — An older model but still popular because it runs fast and handles structured outputs well. If your ZeroClaw setup relies heavily on tool calling, Mistral tends to produce cleaner JSON than some alternatives at this size.

If you want to explore free AI model options beyond Ollama (including free-tier cloud APIs), we've covered that topic in the context of OpenClaw, and most of the provider setup translates directly to ZeroClaw since they share the same provider configuration format.

Performance tips for CPU-only VPS

Most VPS plans don't include a GPU, which means inference runs on CPU. A few things help:

Use quantized models. Ollama defaults to Q4_0 quantization for most models, which is a good balance between quality and speed. If you're on very tight RAM, try a Q2_K variant, though expect some quality degradation.

Don't run other memory-hungry services on the same VPS. If you've got a database, a web server and Ollama all fighting over 8 GB of RAM, everyone loses. ZeroClaw itself is fine since it barely touches memory, but Ollama needs all the headroom it can get.

Consider AMD EPYC or Ryzen-based VPS plans. These processors handle the matrix math in LLM inference better than some older Intel Xeon setups. The difference in tokens-per-second is noticeable, especially on longer responses.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

مدة الإشتراك

1 GB RAM VPS

$3.99 Save 25 %

$2.99 شهري

1 vCPU AMD EPYC
30 GB NVMe تخزين
✔نطاق ترددي غير محدود
✔ IPv4 و IPv6 مضمّنان دعم IPv6 غير متوفر حالياً في فرنسا، فنلندا أو هولندا.
✔1 Gbps شبكة
✔إدارة جدار الحماية
✔مراقبة مجانية

Run ZeroClaw with Ollama for free local AI

VPS requirements for local models

Install Ollama on your VPS

Point ZeroClaw at Ollama

Test the full stack

Picking the right model for your hardware

Performance tips for CPU-only VPS

Your idea deserves better hosting

1 GB RAM VPS

2 GB RAM VPS

4 GB RAM VPS

6 GB RAM VPS

AMD EPYC VPS.P1

AMD EPYC VPS.P2

AMD EPYC VPS.P3

AMD EPYC VPS.P4

AMD EPYC VPS.P5

AMD EPYC VPS.P6

AMD EPYC VPS.P7

EPYC Genoa VPS.G1

EPYC Genoa VPS.G2

EPYC Genoa VPS.G3

EPYC Genoa VPS.G4

EPYC Genoa VPS.G6

EPYC Genoa VPS.G7

1 vCPU AMD Ryzen 9

2 vCPU AMD Ryzen 9

4 vCPU AMD Ryzen 9

8 vCPU AMD Ryzen 9

FAQ

Can I switch between Ollama and a cloud API without reinstalling?

Do I need a GPU VPS to run local models?

How do I update the Ollama model after a new version comes out?

Automate faster, for less

المنتجات

استضافة التطبيقات

المميزات

الموارد

حلول حسب الاستخدام

احصل على المساعدة

الشركة

إنشاء كلمة مرور