Skip to content

Agent Loop & Streaming

The Agent class (src/core/agent.ts) implements the core agentic loop. Each call to agent.send(message) runs up to 20 iterations (configurable via maxIterations in AgentConfig).

send(userMessage)
messages.push({ role: 'user', content: userMessage })
┌── _loop() ─────────────────────────────────────┐
│ │
│ for (i = 0; i < maxIterations; i++) { │
│ response = _callAPI() // with retry │
│ messages.push(assistant content) │
│ │
│ if stop_reason == "end_turn": │
│ memory.maybeUpdate(text) │
│ return text │
│ │
│ if stop_reason == "tool_use": │
│ results = _dispatchTools(content) │
│ messages.push(tool_results) │
│ continue │
│ │
│ if stop_reason == "max_tokens": │
│ if continuationPrompt: │
│ continue (auto-recurse) │
│ } │
│ │
│ if continuationPrompt && continuations < 10: │
│ auto-recurse with continuation prompt │
│ │
└─────────────────────────────────────────────────┘

When continuationPrompt is set (applied through Session):

  • Iteration limit exceeded: Agent auto-recurses with the continuation prompt
  • max_tokens stop reason: Falls through to continuation logic (not silently truncated)
  • Hard cap: MAX_CONTINUATIONS per model (Opus: 20, Sonnet: 10, Haiku: 5) prevents infinite loops regardless of mode

Without continuationPrompt, the loop simply returns after maxIterations.

_callAPI() implements exponential backoff for transient errors:

  • Retries: Up to 3 attempts
  • Base delay: 2s (2s → 4s → 8s)
  • Retryable errors: 429 (rate limit), 529 (overloaded), 5xx (server error)
  • Network errors: ECONNRESET, ETIMEDOUT, fetch failed
  • Non-retryable: 400, 401, 403, 404, 422 — thrown immediately
  • Retry progress: Emitted as error stream events so CLI can display status

By default, NODYN uses adaptive thinking:

thinking: { type: 'adaptive' }

Claude decides the reasoning depth per request. This replaces static budget_tokens allocation and is the recommended mode.

Alternatively, explicit thinking can be configured:

thinking: { type: 'enabled', budget_tokens: 10000 }

The effort parameter controls global reasoning depth:

LevelDescription
lowMinimal reasoning
mediumModerate depth
highThorough (default)
maxMaximum depth

Set via AgentConfig.effort or /thinking command in the CLI.

StreamProcessor (src/core/stream.ts) is a pure stream transformer with no dependencies on other modules. It processes raw SDK events into StreamEvents:

SDK EventAction
message_startCaptures initial usage (includes cache fields)
content_block_startInitializes new content block
content_block_deltaAppends text/thinking/JSON deltas, emits events
content_block_stopParses accumulated tool input JSON, emits tool_call
message_deltaMerges usage, emits turn_end
type StreamEvent =
| { type: 'text'; text: string; agent: string }
| { type: 'thinking'; thinking: string; agent: string }
| { type: 'tool_call'; name: string; input: unknown; agent: string }
| { type: 'tool_result'; name: string; result: string; agent: string }
| { type: 'spawn'; agents: string[]; agent: string }
| { type: 'turn_end'; stop_reason: string; usage: BetaUsage; agent: string }
| { type: 'error'; message: string; agent: string }
| { type: 'trigger'; trigger: string; agent: string }
| { type: 'cost_warning'; snapshot: CostSnapshot; agent: string }
| { type: 'continuation'; iteration: number; agent: string }

When the API returns stop_reason: "tool_use", all tool calls in the response are dispatched in parallel using Promise.allSettled:

const settled = await Promise.allSettled(
toolCalls.map(tc => this._executeOne(tc)),
);

This means:

  • Multiple tool calls execute concurrently
  • A single failure doesn’t block other tools
  • Failed tools return is_error: true results to the model
  • The model sees all results and can recover from individual failures

The Session accumulates token usage across turns:

usage: {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens: number;
cache_read_input_tokens: number;
}

The CLI footer shows a context usage bar (green < 50%, yellow < 80%, red >= 80%) based on CONTEXT_WINDOW[modelId] (Opus: 1M, Sonnet/Haiku: 200K).

Multi-layered guards prevent context overflow and token waste:

Tool result truncation — oversized tool results are truncated at execution time (DEFAULT_MAX_TOOL_RESULT_CHARS = 80,000, configurable via max_tool_result_chars). Publishes contentTruncation diagnostic event.

Knowledge context budgetformatContext() enforces DEFAULT_MAX_KNOWLEDGE_CONTEXT_CHARS = 12,000. When exceeded, drops lowest-scored memories (preserves semantic integrity of remaining entries).

Briefing capMAX_BRIEFING_CHARS = 8,000. Manifest diff trimmed first (lowest priority), run history preserved intact. Briefing is one-shot (cleared after turn 1).

History truncation_truncateHistory() runs before every _callAPI() call:

  • Token estimate: JSON.stringify(messages).length / CHARS_PER_TOKEN where CHARS_PER_TOKEN = 3.5
  • Threshold: 85% of CONTEXT_WINDOW[model] (Opus: ~850K, Sonnet/Haiku: ~170K tokens)
  • Strategy: Keep first message (original task) + last N messages (scaled by context window: Opus keeps up to 100, Sonnet/Haiku up to 20); replace the dropped range with a single placeholder message
  • Content truncation: Second pass truncates oversized content blocks (Opus: 40K chars, Sonnet/Haiku: 8K chars per message)

Context budget observabilitycontext_budget stream event emitted when usage exceeds 70%, with per-block breakdown (system/tool/message tokens).

Prompt caching is GA (no beta header needed). NODYN uses three cache control blocks:

  1. System promptcache_control: { type: 'ephemeral' } on the static system prompt
  2. Knowledge context or memory fallbackcache_control: { type: 'ephemeral' }. Primary path: Knowledge Graph retrieval (HyDE + vector + entity graph expansion + MMR) produces a <relevant_context> block with scope-grouped memories and entity subgraph. Legacy fallback: SQLite cosine search when knowledge_graph_enabled: false. Cold start fallback: full <memory> dump. Empty string = intentionally no block.
  3. Briefing blockcache_control: { type: 'ephemeral' } on session briefing + advisor recommendations

Opus requires 4,096+ tokens for caching to activate. System prompt + tools combined must exceed this threshold.

Top-level cache_control: { type: 'ephemeral' } on the API call auto-marks the last cacheable block.

  • cache_creation_input_tokens and cache_read_input_tokens arrive in message_start
  • message_delta only carries output_tokens
  • StreamProcessor merges delta into existing usage to preserve cache fields

Thinking block signatures are invalidated when responses pass through API proxies. The agent strips all type: 'thinking' blocks from message history before the next API call:

const contentForHistory = response.content.filter(
(b) => b.type !== 'thinking',
);
this.messages.push({ role: 'assistant', content: contentForHistory });

After each completed turn (stop_reason: "end_turn"), the agent fires off memory extraction:

if (this.memory) {
void this.memory.maybeUpdate(text); // fire-and-forget
}

This uses Claude Haiku to analyze the response and extract facts, skills, context, and error information. It never blocks the response — failures are silently ignored.

Each send() call creates a new AbortController. Calling agent.abort() triggers signal propagation to the streaming API call. On abort, the message history is rolled back to the pre-call snapshot.