Intent
What problem does this pattern solve?
It solves the latency bottleneck inherent in sequential LLM-based systems. By running independent agents (e.g., Insights, Plans, and Visualizations) concurrently, it reduces the total execution time from the sum of all agent calls to the duration of the longest single call.
Context
When and where does this pattern appear?
This pattern is triggered during the “Intelligence Generation” phase of the Runner pipeline. It emerged when the system reached a level of complexity where generating a full coaching report (Insights + Plan + Charts) sequentially was taking 10-20 seconds, impacting the responsiveness of the UI and integration hooks.
Agentic profile
Describe the agentic architecture shape explicitly.
- System shape: multi-agent
- Orchestration mode: parallel (hybrid with sequential)
Agent-to-agent interaction
- Present: true
- Mechanism: shared-state (writes to independent fields in context)
- Evidence: pipeline step
Tool protocols
- MCP: absent
- Tool calling: present
- Evidence: intelligent steps and insights
Optimisation target
- Primary: latency
- Secondary: quality
- Notes: Prioritizes fast delivery of intelligence by fanning out I/O bound LLM calls.
Simplicity vs autonomy
- Position: balanced
- Rationale: Agents are kept simple and focused on single tasks, while the orchestrator manages the complexity of concurrency and synchronization.
Forces
List the tensions or constraints:
- Latency vs State Safety: Running tasks in parallel increases speed but risks race conditions if agents write to the same memory/context fields.
- Resource Constraints: Parallel execution increases the burst rate of API calls to LLM providers, potentially hitting rate limits.
- Error Handling: A failure in one parallel branch must be handled without necessarily crashing the entire pipeline.
Solution
Describe the structural move.
The orchestrator identifies “parallel blocks” in the execution plan. Independent agent tasks are extracted as awaitables and wrapped in an asyncio.gather (or equivalent) call. This introduces a “Fan-out/Fan-in” boundary where parallel agents are prohibited from depending on each other’s results, ensuring they can be executed safely in any order.
Implementation signals
Observable signs this pattern exists:
- Use of
asyncio.gather,asyncio.wait, orTaskGroupsin the orchestration layer. - Methods like
run_parallel_agentsthat return lists of tasks instead of executing them immediately. - Clearly separated “Output” fields in the state object that are only written to by a single agent.
Evidence
Reference specific artefacts:
- Pipeline synchronization point.
- Logic to identify and return independent agent tasks.
- An independent step executed concurrently with intelligence agents.
Consequences
Benefits
- Reduced Latency: Significant performance gains in multi-agent workflows.
- Better Resource Utilization: Makes better use of the asynchronous nature of I/O bound LLM calls.
Costs
- Complexity: Harder to trace and debug interleaved logs from multiple agents.
- State Integrity Risk: Requires strict discipline in ensuring agents don’t have write collisions.
Failure modes
- Race Conditions: Occurs if two parallel agents try to modify the same context field or database record simultaneously.
- Partial Failure: If one agent fails, the orchestrator must decide whether to continue with partial data or fail the whole run.
Reuse notes
How can this pattern be reused safely?
- Where it fits: High-latency, I/O bound tasks that are computationally independent.
- Where it does NOT fit: When Agent B requires the output of Agent A to function (Sequential Dependency).
- Minimal version: A simple
asyncio.gatheraround two independent LLM calls. - Preconditions: Requires a non-blocking (async) runtime and a state management system that supports concurrent writes to different fields.
Confidence
- High → The pattern is explicitly architected into the core
RunnerPipelineandIntelligenceStep, with clear evidence of task extraction and synchronization.