Requirement: Chapter 9: Learning and Adaptation
Source
- PDF:
Agentic_Design_Patterns.pdf - Section:
Chapter 9: Learning and Adaptation - Page range:
142-153logical pages fromdocs/agentic-design-patterns-toc.md - Extraction note: the visible Chapter 9 heading was found at PDF page label
154/ zero-based index153, and Chapter 10 starts at PDF page label167/ zero-based index166. The extracted Chapter 9 span is therefore PDF indexes153-165, file pages154-166, with chapter-local page counters1-13. This is ambiguous because the TOC logical range142-153covers 12 logical pages.
Pattern Summary
Learning and adaptation turn a static agent into a system that can improve its behavior, knowledge, or strategy from experience. The chapter frames learning as the process that changes an agent internally, and adaptation as the visible behavioral change that follows from learning.
The chapter surveys several learning modes: reinforcement learning, supervised learning, unsupervised learning, few-shot and zero-shot adaptation with LLMs, online learning, and memory-based learning. It also discusses PPO and DPO as model-training or alignment approaches, then uses SICA, AlphaEvolve, and OpenEvolve to illustrate feedback-driven self-improvement loops.
For the first LangGraph example, this requirement should not implement full model training, PPO, DPO, or unsafe source-code self-modification. Instead, it should implement a bounded adaptation loop: retrieve prior experience, choose a strategy, produce an output, evaluate the result, update a lightweight experience archive or strategy profile, and conditionally retry or route to review.
Pattern Explanation
Conceptual Overview
Learning and adaptation are the mechanisms that let an agent change over time. A learning agent does not only follow a fixed prompt or workflow. It observes outcomes, stores useful experience, and uses those lessons to make better future decisions.
In the chapter, this idea ranges from classical machine learning to LLM-based self-improvement systems. For a practical LangGraph graph, the core idea can be represented as an observable feedback loop: every run creates evidence about what worked, and the next run can use that evidence when selecting a strategy.
Problem
Static agents can degrade when the environment changes, when user preferences evolve, or when a task differs from the examples anticipated during initial design. They may repeat the same bad strategy because there is no durable record of failures, successful approaches, or evaluation feedback.
Learning and adaptation solve this by adding a mechanism for experience capture, outcome evaluation, and future behavior adjustment.
When to Use
- Use this pattern when the agent handles repeated tasks where past outcomes can improve future behavior.
- Use it when the environment, data, tools, or user preferences change over time.
- Use it when there is a reliable feedback signal, score, benchmark, evaluator, or human review outcome.
- Use it when personalization matters, such as assistant behavior that should adapt to a user or team.
- Use it when failed attempts should produce reusable lessons instead of disappearing from the workflow.
- Use it when the system needs an archive of strategies, versions, or experiences for auditability.
When Not to Use
- Avoid this pattern for simple one-off tasks where persistence and feedback add no value.
- Avoid it when there is no trustworthy evaluation signal; bad feedback can teach the wrong behavior.
- Avoid autonomous self-modification in production without sandboxing, review, rollback, and monitoring.
- Avoid storing sensitive user data unless retention, consent, and deletion rules are clear.
- Avoid frequent adaptation when deterministic, stable behavior is more important than improvement.
- Avoid treating lightweight memory updates as equivalent to true model fine-tuning or reinforcement learning.
How It Works
- The workflow receives a task, user request, or environment observation.
- The agent retrieves relevant prior experiences, successful strategies, known failures, or user preferences.
- A strategy selector chooses how to handle the current task using the current input and retrieved lessons.
- The agent produces an output or action using the selected strategy.
- An evaluator scores the result against a rubric, benchmark, user feedback, or deterministic checks.
- The system records an adaptation event containing the input summary, selected strategy, output summary, score, failure reasons, and learned lesson.
- A conditional decision either finalizes, retries with a revised strategy, asks for clarification, or routes to human review.
- Future runs can retrieve the saved adaptation records and change their behavior accordingly.
Trade-offs
| Benefit | Cost or Risk |
|---|---|
| Improves behavior across repeated tasks by using past outcomes. | Bad or biased feedback can reinforce poor behavior. |
| Supports personalization and changing environments. | Requires persistent state, retention policy, and observability. |
| Makes failures useful by turning them into lessons or strategy updates. | Poorly bounded adaptation can drift away from intended behavior. |
| Enables benchmark-driven improvement similar to SICA or AlphaEvolve at a workflow level. | True model training, self-modification, and reward optimization require much stronger safety controls. |
| Provides an auditable archive of strategy choices and scores. | More state and routing logic increase implementation and test complexity. |
Minimal Example
User support request
-> retrieve similar past cases and lessons
-> select response strategy
-> draft response
-> evaluate response against rubric
-> if score is high: store success lesson and finalize
-> if score is low and retries remain: revise strategy and try again
-> if still weak or unsafe: store failure lesson and request human review
LangGraph Mapping
| Pattern Concept | LangGraph Element |
|---|---|
| Current experience | State fields such as input, task_category, and normalized_input |
| Prior learning | State field experience_matches plus optional LangGraph store or injected memory repository |
| Strategy or policy | State fields selected_strategy, strategy_reason, and strategy_profile |
| Action using the strategy | Node generate_response |
| Reward, score, or feedback | Node evaluate_response and state field evaluation |
| Adaptation update | Node adapt_from_result writing adaptation_record and archive_update |
| Retry or review decision | Conditional edge after evaluate_response |
| Human safety boundary | Node mark_needs_review |
LangGraph Implementation Goal
Build a LangGraph example of an adaptive support assistant that improves from prior solved cases and evaluation feedback. The user provides a technical support or troubleshooting request. The graph retrieves similar prior experiences, selects a response strategy, drafts an answer, evaluates the answer, and updates a lightweight experience archive before returning the final result.
The example should demonstrate learning and adaptation at the workflow level rather than training model weights. It should be safe and testable: memory updates should be explicit state or store writes, evaluators should be mockable, and the graph should never modify its own source code. This maps to the chapter's memory-based learning, online learning, SICA-style archive of past versions and scores, and AlphaEvolve/OpenEvolve-style generate-evaluate-select loop.
Expected workflow outcome:
- The agent uses prior successful cases when they are relevant.
- The agent records both successes and failures as adaptation events.
- Low-scoring outputs trigger one bounded revision attempt.
- Repeated weak, unsupported, or unsafe outputs route to human review.
- The final output exposes what strategy was used and what adaptation record was created.
State Shape
List the state fields the graph needs.
| Field | Type | Purpose |
|---|---|---|
input | str | Original user support request or task description. |
user_id | str \| None | Optional user identifier for scoped personalization or memory namespace. |
normalized_input | str | Trimmed and normalized input used for retrieval and generation. |
task_category | str \| None | Coarse category such as connectivity, account, software, hardware, or unknown. |
experience_matches | list[dict] | Retrieved prior cases, lessons, strategy records, or user preferences relevant to the current task. |
strategy_profile | dict[str, Any] | Lightweight policy data such as strategy success counts, average scores, and known pitfalls. |
selected_strategy | str \| None | Chosen handling strategy, such as reuse_known_solution, diagnostic_steps, clarify_first, or escalate. |
strategy_reason | str \| None | Short explanation of why the strategy was selected. |
draft_output | str \| None | Candidate response produced by the graph before finalization. |
evaluation | dict[str, Any] | Score, pass/fail flag, rubric notes, safety flags, and missing information from the evaluator. |
adaptation_record | dict[str, Any] | Structured event summarizing the run, selected strategy, score, outcome, and learned lesson. |
archive_update | dict[str, Any] \| None | Data to persist into the experience archive or LangGraph store after evaluation. |
retry_count | int | Number of adaptation retries already attempted for the current request. |
max_retries | int | Configured cap for revision attempts, initially 1 for a focused example. |
needs_human_review | bool | Whether the graph should stop automatic adaptation and request review. |
errors | list[str] | Recoverable validation, retrieval, model, evaluator, or persistence errors. |
final_output | dict[str, Any] \| None | User-facing result containing the answer, status, strategy, score, and adaptation metadata. |
Nodes
| Node | Responsibility |
|---|---|
preprocess_input | Validate non-empty input, normalize whitespace, initialize retries, errors, and default thresholds. |
classify_task | Assign a coarse task category so retrieval and strategy selection can use a stable label. |
retrieve_experience | Retrieve relevant prior cases, lessons, strategy outcomes, or user preferences from an injected memory source or LangGraph store. |
select_strategy | Choose the response strategy using the current task, retrieved experience, and strategy profile. |
generate_response | Produce a candidate support response using the selected strategy and relevant lessons. |
evaluate_response | Score the draft with a deterministic rubric, fake evaluator, or LLM judge; identify gaps, unsupported claims, and safety concerns. |
revise_strategy | When evaluation fails and retries remain, adjust selected_strategy, incorporate evaluator notes, and increment retry_count. |
adapt_from_result | Create adaptation_record and archive_update from the outcome, including success, failure reason, score, and lesson. |
persist_adaptation | Store the archive update if persistence is configured; otherwise keep it in final state for tests and inspection. |
mark_needs_review | Set needs_human_review for unsafe, unsupported, or repeatedly low-scoring outputs. |
finalize | Produce final_output with the final answer, status, score, strategy, and adaptation metadata. |
Edges
Describe the graph flow, including conditional branches.
START
-> preprocess_input
-> classify_task
-> retrieve_experience
-> select_strategy
-> generate_response
-> evaluate_response
evaluate_response -> adapt_from_result -> persist_adaptation -> finalize -> END
evaluate_response -> revise_strategy -> generate_response
evaluate_response -> mark_needs_review -> adapt_from_result -> persist_adaptation -> finalize -> END
Conditional edge requirements:
- Route from
evaluate_responsetoadapt_from_resultwhen the score meets the configured threshold and no safety flag is present. - Route from
evaluate_responsetorevise_strategywhen the score is below threshold, the issue appears recoverable, andretry_count < max_retries. - Route from
evaluate_responsetomark_needs_reviewwhen the output is unsafe, unsupported by the input or retrieved experience, missing required context, or still below threshold after retries. revise_strategymust incrementretry_countand must not loop beyondmax_retries.persist_adaptationmust not hide write failures. It should append an error and still allowfinalizeto return the response and in-state archive update.- Retrieval and persistence should be injectable so tests can run without a network or external database.
Inputs and Outputs
- Input: a natural-language technical support or troubleshooting request, plus optional
user_id, prior feedback, or an injected in-memory archive for tests. - Output:
final_output, including the answer or review request, selected strategy, evaluation score, status, and adaptation metadata. - Intermediate artifacts: normalized input, task category, retrieved experiences, strategy profile, selected strategy, draft response, evaluator notes, adaptation record, archive update, errors, and retry count.
Example successful output shape:
{
"status": "ok",
"answer": "Try forgetting the Wi-Fi network, restarting the router, and reconnecting. If the issue started after an update, also reset the network settings.",
"selected_strategy": "diagnostic_steps",
"strategy_reason": "Similar prior connectivity cases succeeded with a stepwise diagnostic checklist.",
"evaluation": {
"score": 0.86,
"passed": true,
"notes": ["clear steps", "no unsafe instruction"]
},
"adaptation_record": {
"outcome": "success",
"lesson": "For connectivity issues, start with reversible network diagnostics before escalation."
}
}
Example input shape:
{
"input": "My laptop connects to Wi-Fi but the internet stops working after a few minutes.",
"user_id": "user-123"
}
Failure Cases
Document expected failures, retries, fallback behavior, and human-review points.
- Blank input should fail in
preprocess_inputwithout invoking the model. - No relevant prior experience should not block the graph;
select_strategyshould fall back to a general diagnostic or clarification strategy. - Retrieved experience may be stale, irrelevant, or contradictory. The graph should treat it as guidance, not as ground truth.
- The generator may produce unsupported claims.
evaluate_responseshould flag unsupported or overconfident content and route to revision or review. - Evaluator output may be malformed or missing a score. The graph should append an error and either retry evaluation once or route to review.
- Low score after the retry limit should end in
mark_needs_review, not an unbounded adaptation loop. - Archive persistence may fail. The graph should preserve
archive_updatein state and expose the persistence error infinal_output. - Sensitive or personal data should not be stored in the archive unless the implementation has a retention policy and scoped namespace.
- Self-modifying behavior is out of scope for the first graph. The graph may update strategy records or memory, but it must not edit repository files, prompts on disk, or model weights.
- Reward hacking risk should be acknowledged: optimizing only for evaluator score can produce responses that satisfy the rubric while failing the user. Tests should include at least one case where a high-looking but unsupported answer is rejected.
Test Ideas
- Verify the happy path where a relevant prior case is retrieved,
diagnostic_stepsis selected, the evaluator passes the draft, and an adaptation record is produced. - Verify the cold-start path where no memory exists and the graph still produces a general response with an archive update.
- Verify a low-scoring draft routes through
revise_strategy, incrementsretry_count, and then finalizes after the revised response passes. - Verify a second low score after
max_retriesroutes tomark_needs_review. - Verify unsafe or unsupported generated content routes to human review even if a numeric score is present.
- Verify persistence failure appends an error while preserving
archive_updatein final state. - Verify blank input stops before retrieval or model generation.
- Verify tests can inject a fake retriever, fake generator, fake evaluator, and fake persistence layer.
- Verify final state always includes
selected_strategy,evaluation,adaptation_record,retry_count,errors, andfinal_output. - Verify retrieved memories are scoped by
user_idor a documented default namespace when personalization is enabled.
Open Questions
docs/agentic-design-patterns-toc.mdlists Chapter 9 as logical pages142-153, but PDF text extraction shows the visible Chapter 9 section at page labels154-166, zero-based indexes153-165, with chapter-local page counters1-13.- Should the first implementation use LangGraph's persistent store/checkpointer for the experience archive, or keep an injected in-memory repository until the shared project scaffold is finalized?
- Should evaluation be implemented first as deterministic heuristics, an LLM judge, or a pluggable interface with deterministic fakes for tests?
- What retention and privacy policy should apply to user-specific adaptation records?
- What score threshold and retry cap should be standardized across later adaptive examples?