Hermes Agent subagent delegation for parallel tasks

Alex

13/05/2026

Hermes Agent subagent delegation for parallel tasks - Hermes Agent subagent delegation for parallel tasks

Subagent delegation is the Hermes Agent feature most people don't try until they have a specific reason. Then it changes how they use the agent for complex tasks. The idea: instead of one agent thread chewing through a multi-step problem in series, the main agent spawns helper subagents, each working on one piece in parallel, then merges results.

Useful when the task naturally decomposes. Not useful when steps depend on each other. The trick is knowing the difference.

The mental model

Think of the main agent as a tech lead - the boss. It receives the task, decides it's big enough to split, breaks it into independent pieces, hands each piece to a subagent (a separate agent instance with its own context window), waits for the subagents to finish, then synthesises the combined result. The user only sees the main agent. The subagents are an implementation detail.

What this buys you:

Parallel execution (3 subagents doing 5 minutes of work each take 5 minutes total, not 15)
Isolated context per subagent (each one has its own SOUL.md/USER.md scope; one subagent's noisy intermediate work doesn't bloat the main agent's context)
Clean failure isolation (one subagent erroring doesn't kill the rest)

What it costs you:

More tokens (each subagent has its own system context, so you're paying ~13K of overhead per subagent on top of the main one)
Synthesis latency (the main agent has to wait for the slowest subagent before it can compose the final answer)
Debugging complexity when something goes wrong

Net: subagents win when you have 3+ parallel pieces each taking real time. Lose when you have 2 pieces or when each piece is short.

How to invoke subagents

The simplest pattern is to ask the main agent directly. If you have the subagent capability enabled in your profile, the agent will decide when to use it based on the request shape:

Research the latest pricing for AWS RDS, Azure SQL, and Google Cloud SQL.
Use one subagent per provider so we can run them in parallel.

The agent spawns three subagents, each tasked with one cloud provider, waits for all three, returns a comparison table. The natural-language hint ("use one subagent per provider") triggers the parallel pattern.

For programmatic use:

hermes subagent dispatch \
  --tasks "research AWS RDS pricing" "research Azure SQL pricing" "research GCP SQL pricing" \
  --synthesis "compare the three results and recommend the best option for a small startup"

Cleaner if you want repeatable scripts or want the synthesis prompt to be explicit.

What tasks split well

Two heuristics I use.

Heuristic 1: each piece is self-contained

A research task on three different topics (cloud providers, programming languages, libraries) splits cleanly. The subagents don't need to talk to each other. Each one produces a chunk that the main agent stitches together.

A task where step 2 needs the output of step 1 doesn't split. Subagent 2 can't start until subagent 1 finishes. You've added all the overhead of subagents with none of the parallelism win.

Heuristic 2: each piece is substantial

If each piece takes 30 seconds, the subagent overhead (spinning up, context initialisation, synthesis on the main agent) eats the parallelism savings. Subagents pay back when each piece takes a few minutes.

"Summarise these three tweets" is a one-agent task even though it has three inputs. "Read these three long articles and pull out arguments" is a subagent task because each article is real work.

A worked example: parallel codebase analysis

Real task from my own work. I had three repositories that allegedly implemented the same algorithm. I wanted to know if they really did and where they differed.

Without subagents: the main agent would read repo 1 entirely, then repo 2, then repo 3, then compare. Long session, lots of context bloat, slow.

With subagents: three subagents in parallel, one per repo, each producing a summary of the algorithm as implemented. Main agent receives three summaries, generates the comparison. Five minutes total instead of fifteen.

hermes subagent dispatch \
  --tasks \
    "read ~/code/repo-a and summarise the algorithm in algorithm.py: inputs, outputs, key transformations" \
    "read ~/code/repo-b and summarise the algorithm in core/main.go: inputs, outputs, key transformations" \
    "read ~/code/repo-c and summarise the algorithm in src/algo.rs: inputs, outputs, key transformations" \
  --synthesis "compare the three algorithm implementations. Where do they differ in inputs, transformations or outputs? Which one would I trust most?"

The synthesis step is where the value comes from. Each subagent's summary alone wouldn't tell me the comparison. The main agent's job is to spot the differences across three structured reports.

Tool access for subagents

Subagents inherit the tools enabled in your profile by default. The shell tool, filesystem tool, browse tool: all available to each subagent. If you want to restrict which tools a subagent can use, pass --tool-allowlist:

hermes subagent dispatch \
  --tasks "..." \
  --tool-allowlist "filesystem,web_browse" \
  --synthesis "..."

Why restrict: if your task is research-only, you don't want a subagent making any shell calls. Tighter scope = less surprise.

Cost reality check

Subagents multiply token cost. A task that costs 50K tokens with one agent costs 50K + (N × 13K) with N subagents because each subagent has its own system context.

For 3 subagents, that's ~90K tokens. For 5 subagents, ~115K. At Sonnet pricing, the difference between 50K and 115K is real money.

Mitigation: route subagents to a cheaper model. Each subagent's job is usually focused (read this, summarise that) and doesn't need top-tier reasoning. The main agent does the heavy lifting on synthesis.

hermes subagent dispatch \
  --tasks "..." \
  --subagent-model "anthropic/claude-haiku-4-5" \
  --main-model "anthropic/claude-sonnet-4-6" \
  --synthesis "..."

Haiku for subagents, Sonnet for the main thread. Cost drops by ~70% with minimal accuracy hit because the subagents are doing well-defined extraction tasks.

Cost tracking generally covered in our Hermes cost tracking and budgets piece.

Debugging when a subagent goes wrong

The main pain point. When you ask one agent to do something and it errors, you see the error. When five subagents are running and one of them errors, the main agent sometimes papers over the failure ("here's a comparison based on two of the three repos") without flagging that something broke.

Always enable subagent logging

hermes config set subagent_log_each true

Now each subagent's full conversation is logged to ~/.hermes/logs/subagents/<dispatch-id>/<subagent-n>.log. When something looks weird, the logs tell you which subagent went off course and why.

Force failure surfacing

hermes subagent dispatch \
  --tasks "..." \
  --on-subagent-failure surface \
  --synthesis "..."

With --on-subagent-failure surface, a single subagent failure halts the whole dispatch and tells you exactly which one failed. Better than silent papering for important work. Default is "continue with what we have", which is fine for low-stakes research and bad for anything you'll really rely on.

Subagents vs the Kanban feature

Different patterns for different jobs.

Subagents are for one-shot parallel work where you have a defined set of subtasks and want a synthesised result. The subagents finish, the dispatch ends, you get your answer.

The Kanban feature (newer, v0.13+) is for ongoing multi-agent coordination where multiple agents work on a shared board, taking items, completing them, dropping them back, indefinitely. More overhead, more useful for sustained work.

For a research task: subagents. For a "team of agents working on issues forever": Kanban. They aren't competitors; they solve different shapes of problem.

What subagents don't help with

Two real anti-patterns I've watched people fall into.

"Make Hermes faster". Subagents add latency through dispatch and synthesis overhead. If your single-agent flow takes 30 seconds, splitting into subagents might end up taking 45. Parallelism only wins when the parallel pieces each take real time.

"Avoid the context limit". True that each subagent has its own context window. But the main agent receives all the subagent outputs at synthesis time. If the combined output is huge, the main agent still hits its context limit. Subagents don't magically expand the working set.

Permission and approval modes with subagents

If you have approval mode enabled in your profile, each subagent inherits it. Which means a dispatch with 5 subagents that each want to run a shell command results in 5 approval prompts. Annoying.

For dispatch-heavy use, either run subagents on a profile with looser approval settings or use the --auto-approve-subagent-tools flag (sets a per-dispatch override). Security trade-off: same as letting the agent run tools without approval. Don't combine with broad tool access on production data.

Approval and sandbox patterns more broadly in our Docker sandbox and SSRF protection tutorial.

When I reach for subagents

Honestly, less than once a week. Most days my Hermes use is single-threaded conversation. Subagents come out for:

Multi-source research (3+ articles, 3+ repos, 3+ docs)
Bulk processing (read these 10 PR descriptions, group by theme)
Comparison work (analyse 4 implementations of X, recommend best)

Each of those takes real time per piece, has clean independence between pieces and benefits from synthesis at the end. That's the sweet spot.

Hosting subagent-heavy work on a VPS

Subagents are token-heavy but not particularly CPU-heavy on the Hermes side (the work happens at the provider). What matters is reliable network throughput and consistent uptime so dispatches don't get interrupted mid-flight. The LumaDock Hermes Agent template covers both with unmetered bandwidth and no setup fees. Setup walkthrough in our Hermes Agent complete guide.

Your idea deserves better hosting

24/7 support 30-day money-back guarantee Cancel anytime

Billing Cycle

VPS.S1

27.37 RON Save 17 %

22.80 _RON Monthly

2 vCPU AMD EPYC
2 GB RAMMEMORY
30 GB NVMeSTORAGE
Unmetered bandwidth
IPv4 & IPv6IPv6 is currently unavailable in France, Finland or the Netherlands. included

Hermes Agent subagent delegation for parallel tasks

The mental model

How to invoke subagents

What tasks split well

Heuristic 1: each piece is self-contained

Heuristic 2: each piece is substantial

A worked example: parallel codebase analysis

Tool access for subagents

Cost reality check

Debugging when a subagent goes wrong

Always enable subagent logging

Force failure surfacing

Subagents vs the Kanban feature

What subagents don't help with

Permission and approval modes with subagents

When I reach for subagents

Hosting subagent-heavy work on a VPS

Your idea deserves better hosting

VPS.S1

VPS.S2

VPS.S3

EPYC VPS.P1

EPYC VPS.P2

EPYC VPS.P3

EPYC VPS.P4

EPYC VPS.P5

EPYC VPS.P6

EPYC VPS.P7

Genoa VPS.G2

Genoa VPS.G3

Genoa VPS.G4

Genoa VPS.G6

Genoa VPS.G7

AMD Ryzen VPS.R1

AMD Ryzen VPS.R2

AMD Ryzen VPS.R3

AMD Ryzen VPS.R4

Questions?

When should I use Hermes subagents instead of a single agent?

Do subagents cost more in tokens than a single agent?

Can I see what each subagent did during a dispatch?

Does Hermes subagent delegation work with approval mode?

Your agent runs wild. Your bill doesn't.

Products

App hosting solutions

Resources

Company

Features

Get help

Solutions by use case

Generate Password