Sequential agent execution is a bottleneck. Army Mode fans out tasks to thousands of parallel agents, collects verified results, and returns aggregated output. Turn hours of serial work into minutes.
A single agent processing 1,000 research queries takes hours. Ten agents take a tenth of the time. Ten thousand agents finish in seconds. The bottleneck in most agent workloads is not intelligence -- it is throughput. Army Mode removes that bottleneck.
Common use cases for massive parallelism:
Parallel execution is useless if you cannot trust the results. Army Mode includes built-in verification: define a JSON schema for expected output, and each result is validated before inclusion. Results that fail validation are retried automatically. You get a final report showing success rate, failures, and retry counts.
This is critical for production workloads where you need every result to be usable, not just most of them.
For long-running Army deployments, you can poll for progress. The API returns completed count, pending count, failed count, and partial results as they come in. Build dashboards, set up alerts, or stream results to downstream systems in real time.
Each parallel agent consumes credits for the tools it calls. Memory operations remain free even at scale. You can set a total credit budget for the Army deployment -- if the budget is exhausted, remaining tasks are paused rather than failing silently. This prevents runaway costs on large deployments. See pricing details.
For most workloads, the per-task cost is 1-5 credits, making even a 1,000-task deployment affordable. Compare that to running 1,000 LLM calls through OpenAI -- the tool execution cost is a fraction of the inference cost.
500 credits on signup. Army Mode available on all plans.