Requirement: Appendix F: Under the Hood: Agents' Reasoning Engines
Source
- PDF:
Agentic_Design_Patterns.pdf - Section:
Appendix F - Under the Hood: An Inside Look at the Agents' Reasoning Engines - Page range:
383-396logical pages fromdocs/agentic-design-patterns-toc.md - Extraction note: the Appendix F title appears in the table of contents on PDF file page
1/ zero-based index0, where it is listed as a 14-page appendix. Direct extraction of the logical page range383-396from the PDF labels / file pages yields the tail of Appendix B, Appendix C, and the start of Appendix D, not Appendix F. Searching the extracted PDF text forUnder the Hood,Reasoning Engines, and the Appendix F title found no body heading. The expected body position after Appendix E is also ambiguous: Appendix E appears at PDF file pages399-403/ indexes398-402, but PDF file page404/ index403starts Appendix G, and Appendix G appears duplicated at file pages404-410and410-415before the Conclusion at file page416. This document therefore records Appendix F requirements as a conservative reconstruction from the TOC title and the book's adjacent reasoning material, not from a cleanly extracted Appendix F body.
Pattern Summary
Appendix F is listed as an inside look at agents' reasoning engines. Since the canonical body text is missing or displaced in the extracted PDF, the safest implementation-oriented interpretation is to model the "reasoning engine" as the control layer that turns a user task into a bounded sequence of reasoning operations: task analysis, strategy selection, working-memory updates, optional tool actions, self-evaluation, revision, and finalization.
The LangGraph example should not duplicate Chapter 17's broad survey of reasoning techniques. Instead, it should expose the mechanics behind a reasoning engine: how state is initialized, how the graph chooses a strategy such as direct answer, decomposition, tool-assisted ReAct, or critique-and-revise, how each step is evaluated, and how the loop stops without leaking raw hidden chain-of-thought. The final output should include the answer, a concise user-safe reasoning summary, the selected strategy, step metadata, evidence or observations, and termination reason.
Because the Appendix F body is not available as extracted text, implementation should avoid product claims, external dependencies, and unsupported details. Treat the example as a local, testable reasoning-engine simulator grounded in agentic control-flow concepts already present elsewhere in the source PDF.
Pattern Explanation
Conceptual Overview
A reasoning engine is the part of an agent that decides how to think before it answers or acts. It is not just the language model call. It is the orchestration around the model: what context is prepared, which reasoning strategy is selected, whether tools are needed, how intermediate results are stored, how quality is checked, and when the system should stop.
For a learner, the key idea is that agent reasoning becomes more reliable when it is structured as explicit state transitions rather than a single uninspected prompt. For an engineer, the key implementation requirement is to make those transitions observable and bounded while keeping internal deliberation private and returning only concise rationale and metadata.
Problem
Single-pass agents can respond quickly, but they often fail when a task requires choosing an approach, maintaining intermediate state, checking whether evidence is sufficient, or recovering from an uncertain step. Without a reasoning engine, the application cannot easily tell whether the model answered directly, decomposed the problem, used a tool, revised a draft, or stopped because it ran out of budget.
This pattern solves the problem by making reasoning control explicit. The graph records the task profile, selected strategy, step summaries, observations, critiques, and stop conditions so the behavior can be tested and debugged.
When to Use
- Use this pattern when an agent must choose among reasoning modes instead of always using one prompt.
- Use it when the task may require decomposition, tool use, critique, verification, or multiple steps.
- Use it when observability matters and tests need to assert which reasoning path was taken.
- Use it when cost, latency, and step count must be capped by explicit budgets.
- Use it when the final answer should include a concise explanation of process without exposing raw hidden reasoning.
- Use it when a project needs a reusable control loop that other pattern examples can build on.
When Not to Use
- Avoid this pattern for simple deterministic transformations where a direct function or one model call is enough.
- Avoid it when the task has no meaningful strategy choice and a full graph would only add latency.
- Avoid it when the application cannot tolerate nondeterministic model routing decisions without deterministic guards.
- Avoid exposing raw chain-of-thought logs as user-facing output; return summaries, decisions, and evidence instead.
- Avoid unbounded reasoning loops, retries, or branch exploration without clear termination rules.
- Avoid treating the engine trace as proof of correctness; it is diagnostic metadata, not verification by itself.
How It Works
- The graph receives a user task and initializes a bounded reasoning budget, empty working memory, empty trace, and default status.
- A task analysis step classifies the request by complexity, required knowledge, tool needs, risk, and whether direct response is acceptable.
- A strategy selector chooses a reasoning mode such as
direct,decompose,react_tool_loop, orcritique_and_revise. - The graph prepares working context from the input, constraints, available tools, and any local evidence fixtures.
- The engine proposes the next step, executes it through either an LLM-like reasoning node or an injectable local tool, and records a concise step summary.
- An evaluator checks whether the step improved the answer, found a gap, produced an error, or exhausted the configured budget.
- Conditional routing either loops for another step, switches strategy, drafts an answer, or finalizes a partial result.
- A critique node reviews the draft against the task, observations, and constraints, then revises unsupported or incomplete content.
- The final node returns the answer, strategy, user-safe reasoning summary, trace metadata, observations, errors, and termination reason.
Trade-offs
| Benefit | Cost or Risk |
|---|---|
| Makes agent reasoning observable as state transitions and route choices. | Adds graph complexity beyond a direct model call. |
| Supports different reasoning modes for different task types. | Strategy selection can be wrong unless tested with clear fixtures. |
| Bounded loops reduce runaway cost and latency. | Strict budgets can stop before the best answer is reached. |
| Step summaries help debugging and monitoring. | Summaries can create false confidence if not grounded in observations. |
| Tool and critique nodes make reasoning more reliable for external or uncertain tasks. | Tool failures and shallow critiques must be handled explicitly. |
| User-safe summaries avoid exposing raw hidden deliberation. | Engineers still need internal observability for tests and debugging. |
Minimal Example
Input:
"Estimate whether a small team can ship a CSV import feature in two weeks.
Constraints: two engineers, existing parser library, tests required."
Flow:
prepare_task
analyze_task -> complex planning question, no external network needed
select_reasoning_strategy -> decompose + critique_and_revise
propose_next_step -> identify workstreams
execute_reasoning_step -> parse, validation, UI, tests, release risk
evaluate_step -> enough structure, but missing risk check
propose_next_step -> assess risks and assumptions
evaluate_step -> ready to draft
draft_answer
critique_answer -> mark uncertainty and remove unsupported estimates
finalize_response
Output:
answer: qualified two-week feasibility assessment
strategy: decompose + critique_and_revise
reasoning_summary: concise process description
termination_reason: answer_ready
LangGraph Mapping
| Pattern Concept | LangGraph Element |
|---|---|
| User task | State fields input and normalized_input |
| Reasoning engine configuration | State fields engine_config, available_strategies, and reasoning_budget |
| Task profiling | Node analyze_task and state field task_profile |
| Strategy selection | Node select_reasoning_strategy and state field selected_strategy |
| Working memory | State field working_memory |
| Step proposal | Node propose_next_step and state field next_step |
| Reasoning step execution | Node execute_reasoning_step and state field step_trace |
| Tool-assisted action | Node run_tool_action and state fields tool_requests and observations |
| Step evaluation | Node evaluate_progress and state fields knowledge_gaps, confidence, and termination_reason |
| Strategy switch or loop | Conditional edges from evaluate_progress and state field route_decision |
| Draft and critique | Nodes draft_answer, critique_answer, and revise_answer |
| User-safe reasoning report | State field reasoning_summary, not raw hidden chain-of-thought |
| Final result | Node finalize_response and state field final_output |
LangGraph Implementation Goal
Build a LangGraph example of a reasoning engine controller. The user provides a task, optional constraints, optional local evidence, optional allowed tools, and optional budget overrides. The graph profiles the task, selects a reasoning strategy, runs a bounded reasoning loop, optionally calls local injectable tools, evaluates progress, critiques the draft, and returns a final response with diagnostic metadata.
The example should be deterministic under tests. Model calls should be wrapped so tests can inject fake outputs. Tool behavior should use local fixtures, not network access. The graph should demonstrate how an agent's reasoning engine controls state and routing, rather than claiming that any one prompt reveals the model's true internal reasoning.
Expected workflow outcome:
- Simple requests can route through
directand finalize quickly. - Multi-step requests can route through
decomposeand complete after one or more evaluated steps. - Tool-requiring requests can route through
react_tool_looponly when a matching local tool is allowed. - Low-confidence or unsupported drafts can route through critique and revision before finalization.
- Budget exhaustion returns a partial result with gaps and termination metadata.
- The final output includes concise reasoning metadata, not raw hidden chain-of-thought.
State Shape
List the state fields the graph needs.
| Field | Type | Purpose |
|---|---|---|
input | str | Original user task or question. |
normalized_input | str | Trimmed and normalized task text used by prompts and tools. |
constraints | dict[str, Any] | User-provided scope, quality, tool, budget, or output constraints. |
engine_config | dict[str, Any] | Runtime settings such as allowed strategies, default budgets, tool policy, and confidence thresholds. |
available_strategies | list[str] | Supported reasoning modes such as direct, decompose, react_tool_loop, and critique_and_revise. |
selected_strategy | str \| None | Strategy chosen for the current task. |
strategy_history | list[dict[str, Any]] | Ordered record of strategy choices, switches, and rationales. |
task_profile | dict[str, Any] | Classification of complexity, domain, risk, tool needs, evidence needs, and expected output type. |
reasoning_budget | dict[str, int] | Caps for steps, tool calls, critique rounds, strategy switches, and model calls. |
budget_used | dict[str, int] | Counters for steps, tool calls, critique rounds, strategy switches, and model calls. |
working_memory | dict[str, Any] | Structured scratch state containing subgoals, assumptions, partial conclusions, and open decisions. |
next_step | dict[str, Any] \| None | Proposed next reasoning or tool step. |
step_trace | list[dict[str, Any]] | Concise, user-safe summaries of executed steps, inputs, outputs, and route decisions. |
tool_requests | list[dict[str, Any]] | Tool calls proposed by the reasoning loop. |
observations | list[dict[str, Any]] | Results from local tools, evidence fixtures, or validation checks. |
knowledge_gaps | list[str] | Missing information or unresolved assumptions discovered during evaluation. |
errors | list[str] | Validation, strategy, tool, parsing, or evaluation errors. |
confidence | float | Current confidence score from evaluation, bounded from 0.0 to 1.0. |
route_decision | str \| None | Next route such as continue, use_tool, switch_strategy, draft, partial, or fail. |
draft_answer | str \| None | Initial answer synthesized from working memory and observations. |
critique | dict[str, Any] \| None | Review of the draft for support, completeness, constraint satisfaction, and clarity. |
revised_answer | str \| None | Answer after critique-driven revision. |
reasoning_summary | str \| None | Concise user-facing process summary that omits raw hidden deliberation. |
termination_reason | str \| None | Stop condition such as answer_ready, budget_exhausted, invalid_input, tool_unavailable, or insufficient_evidence. |
status | str | Overall status such as ok, partial, invalid_input, blocked, or failed. |
final_output | dict[str, Any] \| None | Final structured result returned by the graph. |
Nodes
| Node | Responsibility |
|---|---|
prepare_task | Validate input, normalize fields, set default config and budgets, and initialize empty state artifacts. |
analyze_task | Classify complexity, risk, evidence needs, tool needs, and likely output type. |
select_reasoning_strategy | Choose the initial strategy from available_strategies based on task_profile and constraints. |
prepare_working_memory | Seed subgoals, assumptions, open questions, and any local evidence into structured working memory. |
propose_next_step | Decide the next reasoning or tool step given strategy, gaps, confidence, and remaining budget. |
execute_reasoning_step | Run an injectable model-like reasoning operation and append a concise step summary to step_trace. |
run_tool_action | Execute an allowed local tool or fixture lookup, append an observation, and handle tool errors. |
evaluate_progress | Check whether the graph has enough support to draft, needs another step, should switch strategy, or must stop. |
switch_strategy | Change strategy when progress stalls and the switch budget allows it. |
draft_answer | Synthesize an answer from working memory, observations, and supported partial conclusions. |
critique_answer | Review the draft against the original task, constraints, observations, and gaps. |
revise_answer | Remove unsupported content, resolve critique findings, and produce the final answer candidate. |
finalize_response | Build final_output with status, answer, reasoning summary, strategy metadata, trace summary, gaps, errors, and budget usage. |
Edges
Describe the graph flow, including conditional branches.
START
-> prepare_task
-> analyze_task
-> select_reasoning_strategy
-> prepare_working_memory
-> propose_next_step
propose_next_step -> execute_reasoning_step -> evaluate_progress
propose_next_step -> run_tool_action -> evaluate_progress
evaluate_progress -> propose_next_step
evaluate_progress -> switch_strategy -> propose_next_step
evaluate_progress -> draft_answer
evaluate_progress -> finalize_response
draft_answer -> critique_answer
critique_answer -> revise_answer
critique_answer -> finalize_response
revise_answer -> finalize_response
finalize_response -> END
Conditional edge requirements:
- Route from
prepare_tasktofinalize_responsewithstatus: "invalid_input"when the input is blank. - Route from
propose_next_steptorun_tool_actiononly when the next step requests an allowed local tool and budget remains. - Route from
propose_next_steptoexecute_reasoning_stepfor decomposition, analysis, comparison, or synthesis preparation steps. - Route from
evaluate_progressback topropose_next_stepwhen gaps remain and step budget remains. - Route from
evaluate_progresstoswitch_strategywhen confidence stalls, the selected strategy is mismatched, and switch budget remains. - Route from
evaluate_progresstodraft_answerwhen confidence meets the configured threshold or no more useful steps are needed. - Route from
evaluate_progresstofinalize_responsewithstatus: "partial"when the budget is exhausted before the answer is fully supported. - Route from
critique_answertorevise_answerwhen the critique finds fixable unsupported claims, missing constraints, or clarity issues. - Route from
critique_answerdirectly tofinalize_responsewhen the draft is acceptable or the critique budget is exhausted. - The graph must never exceed configured step, tool, critique, strategy-switch, or model-call budgets.
Inputs and Outputs
- Input: a natural-language task, optional constraints, optional local evidence fixtures, optional allowed tools, optional
available_strategies, and optional budget overrides. - Output:
final_output, includingstatus,answer,reasoning_summary,selected_strategy,strategy_history,termination_reason,confidence,knowledge_gaps,trace_summary,observations,budget_used, anderrors. - Intermediate artifacts: task profile, working memory, next step, step trace, tool requests, observations, draft answer, critique, and revised answer.
Example successful output shape:
{
"status": "ok",
"answer": "A two-week CSV import feature is feasible only if the scope is limited to a known schema, server-side validation, error reporting, and automated tests. Custom mapping, background processing, or large-file optimization should be deferred.",
"reasoning_summary": "The engine classified the request as a bounded planning task, decomposed the work into implementation and risk areas, checked assumptions, and revised the answer to mark unsupported scope as deferred.",
"selected_strategy": "decompose",
"strategy_history": [
{
"strategy": "decompose",
"reason": "The task requires multi-step feasibility analysis."
}
],
"termination_reason": "answer_ready",
"confidence": 0.82,
"knowledge_gaps": [],
"trace_summary": [
{
"step": "identify_workstreams",
"result": "Found parser integration, validation, UI, tests, and release risks."
},
{
"step": "check_assumptions",
"result": "Marked schema stability and file size as key assumptions."
}
],
"budget_used": {
"steps": 2,
"tool_calls": 0,
"critique_rounds": 1,
"strategy_switches": 0,
"model_calls": 4
},
"errors": []
}
Example partial output shape:
{
"status": "partial",
"answer": "I can outline the likely approach, but the configured budget ended before the graph could verify file-size and schema assumptions.",
"reasoning_summary": "The engine decomposed the task and found unresolved assumptions before budget exhaustion.",
"selected_strategy": "decompose",
"termination_reason": "budget_exhausted",
"confidence": 0.48,
"knowledge_gaps": [
"maximum expected CSV file size",
"whether custom column mapping is in scope"
],
"budget_used": {
"steps": 1,
"tool_calls": 0,
"critique_rounds": 0,
"strategy_switches": 0,
"model_calls": 2
},
"errors": []
}
Failure Cases
Document expected failures, retries, fallback behavior, and human-review points.
- Blank input should stop in
prepare_taskwithstatusinvalid_input. - Unsupported strategy names should be ignored with an error or replaced by the default strategy.
- Strategy selection failure should fall back to
directonly for simple tasks; complex tasks should returnpartialorfailed. - Tool requests for unavailable or disallowed tools should not execute and should set
termination_reasontotool_unavailablewhen no alternative path exists. - Tool exceptions should be captured in
errors, appended to the trace, and counted against tool budget only for attempted calls. - Repeated low-confidence evaluations should trigger
switch_strategyonly while switch budget remains. - Budget exhaustion should return a partial answer with gaps, not fabricate missing information.
- Critique failures should not erase the draft; finalization should include a warning and lower confidence.
- Raw hidden chain-of-thought should never be placed in
final_output; use concise step summaries and evidence metadata instead. - The graph should terminate deterministically even when fake model outputs request more steps forever.
- If the source ambiguity for Appendix F remains unresolved, implementation should keep Appendix F-specific claims minimal and test behavior around the generic reasoning-engine control loop.
Test Ideas
- Verify a simple factual or formatting request routes through
directand uses at most one reasoning step. - Verify a multi-step planning task routes through
decompose, records multiple step summaries, and finalizes withstatusok. - Verify a tool-requiring task routes to
run_tool_actiononly when the requested tool is allowed. - Verify a disallowed tool request returns
partialorfailedwithout executing the tool. - Verify low confidence can trigger one strategy switch and that switch budget is enforced.
- Verify budget exhaustion stops the loop and includes
termination_reason: "budget_exhausted". - Verify critique removes or flags unsupported claims before final output.
- Verify
final_outputcontainsanswer,reasoning_summary,selected_strategy,termination_reason,budget_used, anderrors. - Verify
final_outputdoes not include raw hidden chain-of-thought fields. - Verify fake model outputs that repeatedly request another step cannot create an infinite loop.
- Verify source ambiguity is represented in documentation and does not depend on external services or unavailable PDF text.
Open Questions
- The Appendix F body text could not be located in the extracted PDF. The TOC lists Appendix F as logical pages
383-396, but those PDF labels / file pages contain Appendix B, Appendix C, and Appendix D material. - The expected continuation after Appendix E is inconsistent. Appendix E appears at PDF file pages
399-403, but Appendix G starts immediately at file page404, where Appendix F would be expected if the TOC order were followed. - Appendix G appears duplicated before the Conclusion, suggesting that a PDF assembly or extraction error may have replaced or displaced Appendix F.
- Later Glossary pages
433-436/ indexes432-435briefly define reasoning capabilities such as Chain of Thought, Tree of Thoughts, ReAct, Planning, Deep Research, and Critique Model. Those pages are labeled Glossary, not Appendix F, so they were not treated as the authoritative Appendix F body. - If a corrected PDF or separate Appendix F source is provided, this requirement should be revised against that source before implementation is considered complete.