LLM AI Agents

The Complete Guide to 17 Agentic Reasoning & Planning Algorithms

A practical deep-dive into the algorithms powering modern AI agents - from Chain-of-Thought to automated workflow discovery. Each algorithm is explained with flow diagrams, simple examples, and Python pseudocode

Introduction

Imagine asking an AI to plan your two-week trip to Japan. A basic LLM would dump a wall of text and call it a day. But an agentic AI would break the problem into pieces - research flights, check hotel availability, look up visa requirements, cross-reference your calendar, find restaurants near each hotel - and then stitch everything together into a coherent itinerary. The difference? The reasoning and planning algorithm running under the hood.

That's what this guide is about. Behind every capable AI agent - whether it's writing code, booking travel, debugging software, or orchestrating complex workflows - lies an algorithm that determines how the agent thinks and acts. Some think in chains. Others explore trees. A few build entire graphs of interconnected ideas. And the newest ones? They discover their own algorithms automatically.

This guide walks you through 17 foundational algorithms that form the backbone of agentic AI. For each one, you'll find a flow diagram, a plain-English example, Python pseudocode, the original research paper, and an honest assessment of trade-offs.

The algorithms naturally fall into four categories:

  • Reasoning Frameworks (1-4): How agents think step-by-step
  • Planning Architectures (5-8): How agents organize and execute multi-step tasks
  • Self-Improvement Loops (9-12): How agents learn from mistakes and refine outputs
  • Advanced Topologies (13-17): How agents structure complex reasoning beyond simple chains

Part 1: Reasoning Frameworks

1. ReAct: Reasoning + Acting

Paper: ReAct: Synergizing Reasoning and Acting in Language Models - Yao et al., ICLR 2023

ReAct interleaves reasoning (thinking) with acting (tool use) in a continuous loop. Before ReAct, researchers treated these as separate problems. ReAct's insight is that they strengthen each other: reasoning guides better actions, and real-world observations ground reasoning to reduce hallucinations.

Flow Diagram

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  πŸ“₯ User Query  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚                         β”‚
          β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
          β”‚  πŸ’­ THOUGHT             β”‚    β”‚
          β”‚  LLM reasons about     β”‚    β”‚
          β”‚  what to do next       β”‚    β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                       β”‚                β”‚
                       β–Ό                β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
          β”‚  ⚑ ACTION              β”‚    β”‚
          β”‚  Call a tool           β”‚    β”‚
          β”‚  (search, calculate)   β”‚    β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                       β”‚                β”‚
                       β–Ό                β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
          β”‚  πŸ‘οΈ OBSERVATION         β”‚    β”‚
          β”‚  Get real-world result β”‚    β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                       β”‚                β”‚
                       β–Ό                β”‚
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
                 β”‚  Answer   │───Noβ”€β”€β”€β”€β”˜
                 β”‚  found?   β”‚
                 β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                   Yes β”‚
                       β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚ βœ… Final Answerβ”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: "Who painted the ceiling of the Sistine Chapel?"

Step Type Content
1 Thought I need to search for who painted the Sistine Chapel ceiling.
2 Action Search["Sistine Chapel ceiling painter"]
3 Observation Michelangelo painted the ceiling between 1508-1512...
4 Thought I have the answer - it was Michelangelo.
5 Answer Michelangelo

Python Pseudocode

def react_agent(question: str, tools: dict, max_steps: int = 10) -> str:
    """ReAct Agent: Interleaves Thought, Action, and Observation."""
    trajectory = f"Question: {question}\n"

    for step in range(1, max_steps + 1):
        # THOUGHT: LLM reasons about what to do next
        response = llm.generate(REACT_PROMPT + trajectory)

        if "Final Answer:" in response:
            return extract_final_answer(response)

        # ACTION: Parse and execute tool call
        thought, action_name, action_input = parse_thought_and_action(response)
        trajectory += f"Thought {step}: {thought}\n"
        trajectory += f"Action {step}: {action_name}[{action_input}]\n"

        # OBSERVATION: Get real-world feedback
        observation = tools[action_name](action_input)
        trajectory += f"Observation {step}: {observation}\n"

    return "Max steps reached."

Key Insight: Reasoning guides better actions, and real-world observations ground reasoning - reducing hallucinations. ReAct is the foundation of modern agent frameworks (LangChain, LlamaIndex).

Trade-offs: High token usage (multiple LLM calls per query), 10-30s latency vs 800ms for single calls, overkill for simple tasks.

2. Chain-of-Thought (CoT) Prompting

Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - Wei et al., NeurIPS 2022

Instead of asking an LLM to jump straight to an answer, you show it (or tell it) to think step-by-step. This simple technique unlocks powerful reasoning abilities that already exist in large models but aren't activated by default prompting.

Flow Diagram

          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  πŸ“₯ Question                     β”‚
          β”‚  + "Let's think step by step"    β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  πŸ’­ Step 1: Identify what's given β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  πŸ’­ Step 2: Apply logic/formula   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  πŸ’­ Step 3: Compute result        β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  πŸ’­ Step 4: Derive final answer   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚  βœ… Final Answer   β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: "A store has 23 apples. They sell 8, then get a delivery of 15. How many?"

Step Reasoning
Step 1 Start with 23 apples.
Step 2 Sell 8: 23 - 8 = 15 apples.
Step 3 Delivery of 15: 15 + 15 = 30 apples.
Answer 30 apples

Python Pseudocode

def chain_of_thought(question: str, examples: list = None) -> str:
    """Chain-of-Thought: Encourage step-by-step reasoning."""
    if examples:
        # Few-shot CoT: provide worked examples
        prompt = "Solve by thinking step-by-step.\n\n"
        for ex in examples:
            prompt += f"Q: {ex['question']}\n"
            prompt += f"A: {ex['reasoning']} The answer is {ex['answer']}.\n\n"
        prompt += f"Q: {question}\nA:"
    else:
        # Zero-shot CoT: just add the magic phrase
        prompt = f"Q: {question}\nA: Let's think step by step."

    return llm.generate(prompt)

Key Insight: Just adding "Let's think step by step" can unlock reasoning in large models - no training needed. But CoT only works with large models (~100B+ params). Reasoning is an emergent property of scale.

Results: CoT with large models significantly outperforms standard prompting on math and reasoning tasks.

3. Self-Consistency CoT

Paper: Self-Consistency Improves Chain of Thought Reasoning - Wang et al., 2022

Generate many reasoning paths and pick the answer that appears most often. If multiple independent paths converge on the same answer, it's probably correct.

Flow Diagram

                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚  πŸ“₯ Question     β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                β”‚                β”‚
              β–Ό                β–Ό                β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  πŸ’­ Path 1    β”‚ β”‚  πŸ’­ Path 2    β”‚ β”‚  πŸ’­ Path 3    β”‚  ...
     β”‚  23-8=15     β”‚ β”‚  23+15=38    β”‚ β”‚  23+15-8     β”‚
     β”‚  15+15=30    β”‚ β”‚  38-8=30     β”‚ β”‚  = 30        β”‚
     β”‚  Ans: 30 βœ…  β”‚ β”‚  Ans: 30 βœ…  β”‚ β”‚  Ans: 30 βœ…  β”‚
     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                β”‚                β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  πŸ—³οΈ MAJORITY VOTE     β”‚
                  β”‚  "30" wins (4 of 5)  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚  βœ… Answer: 30    β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Same apple problem, 5 reasoning paths

Path Reasoning Answer
Path 1 23 - 8 = 15, then 15 + 15 = 30 βœ…
Path 2 23 + 15 = 38, then 38 - 8 = 30 βœ…
Path 3 15 - 8 = 7, then 23 + 7 = 30 βœ…
Path 4 23 - 8 = 15, then 15 + 5 = 20 (error!) ❌
Path 5 23 + 15 - 8 = 30 βœ…
Vote 30 appeared 4/5 times 30 wins

Python Pseudocode

from collections import Counter

def self_consistency_cot(question: str, k: int = 10, temperature: float = 0.7) -> str:
    """Self-Consistency: Sample multiple CoT paths, majority vote."""
    answers = []
    for _ in range(k):
        response = llm.generate(
            f"Q: {question}\nA: Let's think step by step.",
            temperature=temperature  # Higher temp = more diverse paths
        )
        answers.append(extract_answer(response))

    # Majority vote
    return Counter(answers).most_common(1)[0][0]

Key Insight: Diverse reasoning paths act like an ensemble - errors get outvoted. Consistently outperforms standard CoT across math and reasoning benchmarks.

Trade-off: kΓ— more compute (10 samples = 10Γ— cost), diminishing returns beyond ~20-40 samples.

4. Tree of Thoughts (ToT)

Paper: Tree of Thoughts: Deliberate Problem Solving with Large Language Models - Yao et al., NeurIPS 2023

Explore multiple reasoning branches like a tree. At each step, generate candidates, evaluate them, prune bad ones, and backtrack when needed.

Flow Diagram

                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚  πŸ“₯ Problem: Make 24 from [4, 5, 6, 3] β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚                β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                └──────────────┐
              β”‚                                             β”‚
              β–Ό                                             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  πŸ’­ Branch A       β”‚                       β”‚  πŸ’­ Branch B       β”‚
    β”‚  4 + 5 = 9        β”‚                       β”‚  5 Γ— 6 = 30       β”‚
    β”‚  Score: 7/10      β”‚                       β”‚  Score: 8/10 ⭐    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚                                           β”‚
        β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                                 β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
        β–Ό         β–Ό                                 β–Ό         β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚9Γ—3=27 β”‚ β”‚9-3=6  β”‚                       β”‚30-3=27β”‚ β”‚(6-4)=2   β”‚
    β”‚27-6=21β”‚ β”‚6Γ—6=36 β”‚                       β”‚  ❌   β”‚ β”‚(5+3)=8   β”‚
    β”‚  ❌   β”‚ β”‚  ❌   β”‚                       β””β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚2 Γ— 8 = 24β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜                                 β”‚  βœ… !!!  β”‚
         β”‚         β”‚                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό         β–Ό                                         β”‚
    ↩️ Backtrack  ↩️ Backtrack                                β–Ό
                                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                  β”‚ βœ… Answer: 24     β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚ (6-4)Γ—(5+3)=2Γ—12 β”‚
          β”‚  πŸ’­ Branch C       β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚  4 - 3 = 1        β”‚
          β”‚  Score: 2/10 ❌    β”‚
          β”‚  πŸ—‘οΈ PRUNED         β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Game of 24 - Make 24 from [4, 5, 6, 3]

Branch Exploration Result
A 4 + 5 = 9 β†’ 9 Γ— 3 = 27 β†’ 27 - 6 = 21 ❌ Dead end, backtrack
B 5 Γ— 6 = 30 β†’ Try (6-4)Γ—(5+3) = 2Γ—12 = 24 βœ… Found it!
C 4 - 3 = 1 β†’ Score too low πŸ—‘οΈ Pruned

CoT can't backtrack and struggles with combinatorial problems. ToT explores + backtracks, delivering massive improvements on tasks like Game of 24.

Python Pseudocode

def tree_of_thoughts(problem: str, breadth: int = 5, depth: int = 3) -> str:
    """Tree of Thoughts: BFS over reasoning tree with LLM evaluation."""
    current_states = [problem]

    for step in range(depth):
        candidates = []
        for state in current_states:
            thoughts = llm.generate_n(f"Next steps for:\n{state}", n=breadth)
            for thought in thoughts:
                new_state = state + "\n" + thought
                score = llm.evaluate(f"Rate this (0-10):\n{new_state}")
                candidates.append((new_state, score))

        candidates.sort(key=lambda x: x[1], reverse=True)
        current_states = [s for s, _ in candidates[:breadth]]

    return current_states[0]

Part 2: Planning Architectures

5. LATS: Language Agent Tree Search

Paper: Language Agent Tree Search Unifies Reasoning, Acting, and Planning - Zhou et al., ICML 2024

LATS combines Monte Carlo Tree Search (the algorithm behind AlphaGo) with LLM agents. It explores, evaluates, backtracks, and learns from self-reflection on failures.

Flow Diagram

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   πŸ“₯ Task       β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  1. SELECT           β”‚ ◄─────────────────┐
     β”‚  Pick best node      β”‚                   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
                β”‚                               β”‚
                β–Ό                               β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
     β”‚  2. EXPAND           β”‚                   β”‚
     β”‚  LLM generates       β”‚                   β”‚
     β”‚  possible actions    β”‚                   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
                β”‚                               β”‚
                β–Ό                               β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
     β”‚  3. EVALUATE         β”‚                   β”‚
     β”‚  Score the state     β”‚                   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
                β”‚                               β”‚
                β–Ό                               β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
     β”‚  4. BACKPROPAGATE    β”‚                   β”‚
     β”‚  Update tree scores  β”‚                   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
                β”‚                               β”‚
           β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                          β”‚
           β”‚ Success? β”‚                          β”‚
           β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                          β”‚
         Yes β”‚     β”‚ No                         β”‚
             β”‚     β–Ό                            β”‚
             β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
             β”‚  β”‚ 5. SELF-REFLECT  β”‚            β”‚
             β”‚  β”‚ "Failed because  β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  β”‚  ..." + retry    β”‚
             β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  βœ… Best Action      β”‚
     β”‚     Sequence         β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Writing a function to reverse a linked list

Step Action Result
Select Root β†’ most promising path -
Expand Try iterative approach with prev/curr pointers Code v1
Evaluate Run tests β†’ 2/5 pass Score: 0.4
Reflect "Failed on edge case: empty list. Need base case." Stored
Expand 2 Add if not head: return None + fix pointers Code v2
Evaluate 2 Run tests β†’ 5/5 pass! Score: 1.0 βœ…

Key Insight: Self-reflection prevents repeating the same mistakes. Strong improvements on code generation and interactive environments.

Python Pseudocode

import math

def lats(task: str, tools: dict, n_iterations: int = 50) -> str:
    """LATS: Monte Carlo Tree Search + LLM agent with self-reflection."""
    root = MCTSNode(state=task)
    reflections = []  # Memory of past failures

    for _ in range(n_iterations):
        # 1. SELECT: pick most promising node using UCB1
        node = select_node(root, exploration_weight=1.4)

        # 2. EXPAND: generate possible actions via LLM
        actions = llm.generate(
            f"Task: {node.state}\nPast failures: {reflections}\nSuggest next actions:"
        )
        children = [MCTSNode(state=apply(node.state, a)) for a in parse_actions(actions)]
        node.children.extend(children)

        # 3. EVALUATE: score the new state
        child = children[0]
        result = execute_with_tools(child.state, tools)
        score = evaluate_result(result)

        # 4. BACKPROPAGATE: update scores up the tree
        backpropagate(child, score)

        if score >= 1.0:  # Success!
            return extract_action_sequence(child)

        # 5. SELF-REFLECT on failure
        reflection = llm.generate(f"This approach failed: {result}\nWhy? How to improve?")
        reflections.append(reflection)

    return extract_best_path(root)

6. Plan-and-Execute Agent

Paper: Plan-and-Solve Prompting - Wang et al., ACL 2023

First make a complete plan, then execute it step by step. Separates the "strategist" (planner) from the "worker" (executor) - can use different models for each.

Flow Diagram

       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  πŸ“₯ Task: "Compare weather in             β”‚
       β”‚      Tokyo and London"                    β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  🧠 PLANNER (GPT-4 - powerful model)      β”‚
       β”‚                                          β”‚
       β”‚  Plan:                                   β”‚
       β”‚    1. Search Tokyo weather               β”‚
       β”‚    2. Search London weather              β”‚
       β”‚    3. Compare and summarize              β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  ⚑ EXECUTOR (GPT-3.5 - cheaper model)    β”‚
       β”‚                                          β”‚
       β”‚  Step 1: Search β†’ "Tokyo: 28Β°C, sunny"  β”‚
       β”‚         β”‚                                β”‚
       β”‚         β–Ό                                β”‚
       β”‚  Step 2: Search β†’ "London: 15Β°C, rainy" β”‚
       β”‚         β”‚                                β”‚
       β”‚         β–Ό                                β”‚
       β”‚  Step 3: Compare both results            β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  βœ… "Tokyo: 28Β°C sunny. London: 15Β°C     β”‚
       β”‚      rainy. Tokyo is 13Β° warmer."        β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: "Compare weather in Tokyo and London"

Phase Model Action
Plan GPT-4 1) Get Tokyo weather β†’ 2) Get London weather β†’ 3) Compare
Execute Step 1 GPT-3.5 Search β†’ "Tokyo: 28Β°C, sunny, humidity 65%"
Execute Step 2 GPT-3.5 Search β†’ "London: 15Β°C, rainy, humidity 82%"
Execute Step 3 GPT-3.5 Compare β†’ "Tokyo is 13Β°C warmer and less humid"

Key Insight: Use a powerful model for planning and a cheap model for execution - saves cost without sacrificing quality.

Python Pseudocode

def plan_and_execute(task: str, tools: dict) -> str:
    """Plan-and-Execute: Separate planning from execution."""
    # PLAN: Use a powerful model to create a step-by-step plan
    plan = planner_llm.generate(  # e.g., GPT-4
        f"Create a step-by-step plan to accomplish:\n{task}"
    )
    steps = parse_steps(plan)

    # EXECUTE: Use a cheaper model to carry out each step
    results = []
    for step in steps:
        result = executor_llm.generate(  # e.g., GPT-3.5
            f"Execute this step using available tools:\n{step}\n"
            f"Previous results: {results}"
        )
        results.append(execute_with_tools(result, tools))

    # SYNTHESIZE: Combine all results into a final answer
    return planner_llm.generate(
        f"Task: {task}\nStep results: {results}\nSynthesize final answer:"
    )

7. ReWOO: Reasoning Without Observation

Paper: ReWOO: Decoupling Reasoning from Observations - Xu et al., 2023

Plan ALL tool calls upfront, execute them all, then reason once. Saves ~5Γ— tokens vs ReAct.

Flow Diagram

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  πŸ“₯ "Who is older: the director of Titanic          β”‚
  β”‚      or the director of Avatar?"                    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  πŸ“‹ PLANNER (one LLM call)                           β”‚
  β”‚                                                     β”‚
  β”‚  #E1 = Search["director of Titanic"]                β”‚
  β”‚  #E2 = Search["director of Avatar"]                 β”‚
  β”‚  #E3 = Search["age of #E1"]                         β”‚
  β”‚  #E4 = Search["age of #E2"]                         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  ⚑ WORKER (execute all tool calls)                   β”‚
  β”‚                                                     β”‚
  β”‚  #E1 β†’ "James Cameron"                             β”‚
  β”‚  #E2 β†’ "James Cameron"                             β”‚
  β”‚  #E3 β†’ "Born 1954, age 71"                         β”‚
  β”‚  #E4 β†’ Same person!                                β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  🧩 SOLVER (one LLM call)                            β”‚
  β”‚                                                     β”‚
  β”‚  "Both films were directed by James Cameron.        β”‚
  β”‚   Same person - the question is moot!"              β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Efficiency Comparison

Metric ReAct ReWOO
HotpotQA Accuracy Baseline Higher
Tokens Used ~10,000 ~2,000
Token Efficiency 1x ~5x

Key Insight: ReAct re-sends the entire conversation history at every step. ReWOO avoids this by planning upfront - 5Γ— fewer tokens, better accuracy.

Python Pseudocode

def rewoo(question: str, tools: dict) -> str:
    """ReWOO: Plan all tool calls upfront, execute, then reason once."""
    # PLANNER: One LLM call to create the full plan with variable references
    plan = planner_llm.generate(
        f"Plan tool calls to answer: {question}\n"
        f"Use #E1, #E2, etc. as variable placeholders.\n"
        f"Available tools: {list(tools.keys())}"
    )
    steps = parse_plan_with_vars(plan)  # e.g., [("#E1", "Search", "director of Titanic"), ...]

    # WORKER: Execute all tool calls, resolving variable references
    evidence = {}
    for var, tool_name, args in steps:
        # Replace variable references like #E1 with actual results
        resolved_args = resolve_variables(args, evidence)
        evidence[var] = tools[tool_name](resolved_args)

    # SOLVER: One LLM call to synthesize everything
    return solver_llm.generate(
        f"Question: {question}\nEvidence: {evidence}\nAnswer:"
    )

8. LLMCompiler: Parallel Function Calling

Paper: An LLM Compiler for Parallel Function Calling - Kim et al., ICML 2024

Creates a task dependency graph (DAG) and runs independent tasks in parallel.

Flow Diagram

       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  πŸ“₯ "Compare ratings of 3 movies:             β”‚
       β”‚      Inception, Interstellar, Tenet"          β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  πŸ“‹ PLANNER: Create Task DAG   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚           β”‚           β”‚
                      β–Ό           β–Ό           β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Task 1  β”‚ β”‚ Task 2  β”‚ β”‚ Task 3  β”‚
              β”‚ Search  β”‚ β”‚ Search  β”‚ β”‚ Search  β”‚
              β”‚Inceptionβ”‚ β”‚Interst. β”‚ β”‚ Tenet   β”‚
              β”‚deps: [] β”‚ β”‚deps: [] β”‚ β”‚deps: [] β”‚
              β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                   β”‚           β”‚           β”‚
                   β”‚   ⚑ ALL RUN IN PARALLEL ⚑
                   β”‚           β”‚           β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  πŸ”— Task 4: Compare ratings    β”‚
              β”‚  deps: [Task 1, Task 2, Task 3]β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  βœ… "Inception: 8.8            β”‚
              β”‚      Interstellar: 8.7        β”‚
              β”‚      Tenet: 7.3"              β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Python Pseudocode

import asyncio

def llm_compiler(query: str, tools: dict) -> str:
    """LLMCompiler: Plan a DAG, execute independent tasks in parallel."""
    dag = planner_llm.generate(
        f"Query: {query}\nCreate tasks with: task_id, tool, args, dependencies"
    )
    tasks = parse_dag(dag)

    results = {}
    async def run(task):
        while not all(d in results for d in task.deps):
            await asyncio.sleep(0.1)
        args = resolve_vars(task.args, results)
        results[task.id] = tools[task.tool](args)

    asyncio.run(asyncio.gather(*[run(t) for t in tasks]))
    return solver_llm.generate(f"Query: {query}\nResults: {results}")

Key Insight: Parallelism is the key win. Independent tasks run simultaneously, dramatically reducing latency and cost compared to sequential approaches like ReAct.

Part 3: Self-Improvement Loops

9. Reflexion: Verbal Reinforcement Learning

Paper: Reflexion: Language Agents with Verbal Reinforcement Learning - Shinn et al., NeurIPS 2023

When the agent fails, it writes a "lessons learned" reflection and stores it in memory. On the next attempt, it reads past reflections and avoids repeating mistakes.

Flow Diagram

              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   πŸ“₯ Task       β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  πŸ€– ACTOR                 β”‚ ◄───────────────┐
     β”‚  Attempt task             β”‚                 β”‚
     β”‚  (reads past reflections) β”‚                 β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                  β”‚                                β”‚
                  β–Ό                                β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
     β”‚  πŸ“Š EVALUATOR             β”‚                 β”‚
     β”‚  Run tests / score        β”‚                 β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                  β”‚                                β”‚
             β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                           β”‚
             β”‚  Pass?  β”‚                           β”‚
             β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                           β”‚
        Yes β”‚          β”‚ No                        β”‚
            β–Ό          β–Ό                           β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
     β”‚ βœ… Done!    β”‚  β”‚ πŸͺž SELF-REFLECT      β”‚     β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ "What went wrong?"   β”‚     β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
                                β”‚                  β”‚
                                β–Ό                  β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
                     β”‚ πŸ’Ύ MEMORY             β”‚β”€β”€β”€β”€β”€β”˜
                     β”‚ Store lesson learned  β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Write is_palindrome()

Trial Code Tests Reflection
1 return s == s[::-1] "racecar" βœ…, "Race Car" ❌ "Need to normalize: lowercase + remove spaces"
2 s = s.lower().replace(" ",""); ... "Race Car" βœ…, "A man, a plan" ❌ "Also strip non-alphanumeric chars"
3 Uses regex to keep only alphanumeric All tests pass βœ… -

Key Insight: Learning from failure without retraining. Each reflection builds persistent memory that prevents the same mistake twice.

Python Pseudocode

def reflexion(task: str, evaluator, max_trials: int = 5) -> str:
    """Reflexion: Learn from failures via verbal self-reflection."""
    memory = []  # Stores reflections from past attempts

    for trial in range(1, max_trials + 1):
        # ACTOR: Attempt the task (reading past reflections)
        prompt = f"Task: {task}\n"
        if memory:
            prompt += f"Lessons from past attempts:\n" + "\n".join(memory) + "\n"
        prompt += "Generate solution:"
        solution = llm.generate(prompt)

        # EVALUATOR: Check if it's correct
        score, feedback = evaluator(solution)
        if score >= 1.0:
            return solution  # Success!

        # SELF-REFLECT: Analyze what went wrong
        reflection = llm.generate(
            f"Task: {task}\nYour solution: {solution}\n"
            f"Feedback: {feedback}\nWhat went wrong and how to fix it?"
        )
        memory.append(f"Trial {trial}: {reflection}")

    return solution  # Return best attempt

10. Self-Refine: Iterative Self-Improvement

Paper: Self-Refine: Iterative Refinement with Self-Feedback - Madaan et al., 2023

One LLM plays three roles: generator, critic, and refiner. Generate β†’ critique β†’ refine β†’ repeat.

Flow Diagram

              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   πŸ“₯ Task       β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  ✍️ GENERATE               β”‚
         β”‚  First draft              β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚       β–Ό                          β”‚
              β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
              β”‚  β”‚  πŸ” CRITIQUE              β”‚    β”‚
              β”‚  β”‚  "What's wrong?"          β”‚    β”‚
              β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
              β”‚               β”‚                  β”‚
              β”‚          β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”            β”‚
              β”‚          β”‚  Good    β”‚            β”‚
              β”‚          β”‚  enough? β”‚            β”‚
              β”‚          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜            β”‚
              β”‚        Yes β”‚     β”‚ No            β”‚
              β”‚            β”‚     β–Ό               β”‚
              β”‚            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
              β”‚            β”‚  β”‚  ✍️ REFINE      β”‚ β”‚
              β”‚            β”‚  β”‚  Improve based β”‚ β”‚
              β”‚            β”‚  β”‚  on critique   β”‚ β”‚
              β”‚            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
              β”‚            β”‚          β”‚          β”‚
              β”‚            β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚            β–Ό
              β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              └──│  βœ… Polished Output   β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Professional email

Round Draft Critique
1 "I can't make the meeting Thursday." "Too blunt. No greeting, no alternative."
2 "Hi Sarah, I have a conflict Thursday. Could we reschedule to Friday?" "Better, but could acknowledge importance."
3 "Hi Sarah, I appreciate you organizing this. Unfortunately I have a conflict - would Friday at 2pm work?" "Professional, warm. Looks good! βœ…"

Key Insight: Self-critique catches what the initial generation misses. Humans consistently prefer Self-Refine outputs over single-pass generation.

Python Pseudocode

def self_refine(task: str, max_rounds: int = 5) -> str:
    """Self-Refine: Generate, critique, refine in a loop."""
    # GENERATE: Create initial draft
    draft = llm.generate(f"Complete this task:\n{task}")

    for round in range(max_rounds):
        # CRITIQUE: Same LLM evaluates its own work
        critique = llm.generate(
            f"Task: {task}\nCurrent draft:\n{draft}\n\n"
            f"What's wrong with this? Be specific about improvements needed."
        )

        # Check if the critique says it's good enough
        if "looks good" in critique.lower() or "no issues" in critique.lower():
            break

        # REFINE: Improve based on the critique
        draft = llm.generate(
            f"Task: {task}\nCurrent draft:\n{draft}\n"
            f"Critique: {critique}\n\nImprove the draft based on this feedback:"
        )

    return draft

11. RAP: Reasoning via Planning

Paper: Reasoning with Language Model is Planning with World Model - Hao et al., EMNLP 2023

The LLM plays dual roles: world model (predicts outcomes) and reasoning agent (picks actions). MCTS searches for the best reasoning path.

Flow Diagram

                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚   πŸ“₯ Problem      β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                 β–Ό                      β”‚
         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
         β”‚  β”‚  πŸ€– LLM as AGENT               β”‚    β”‚
         β”‚  β”‚  "Go to dairy aisle"           β”‚    β”‚
         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
         β”‚                 β”‚                      β”‚
         β”‚                 β–Ό                      β”‚
         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
         β”‚  β”‚  🌍 LLM as WORLD MODEL         β”‚    β”‚
         β”‚  β”‚  "Now at dairy aisle.          β”‚    β”‚
         β”‚  β”‚   Milk and eggs available."    β”‚    β”‚
         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
         β”‚                 β”‚                      β”‚
         β”‚                 β–Ό                      β”‚
         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
         β”‚  β”‚  πŸ“Š Reward: +2                  β”‚    β”‚
         β”‚  β”‚  (2 items collected)           β”‚    β”‚
         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
         β”‚                 β”‚                      β”‚
         β”‚                 β–Ό                      β”‚
         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
         β”‚  β”‚  ↩️ Backpropagate               β”‚    β”‚
         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
         β”‚                 β”‚                      β”‚
         β”‚           More iterations? β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                 β”‚ No
         β”‚                 β–Ό
         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └──│  βœ… Optimal Plan:               β”‚
            β”‚  dairy β†’ bakery β†’ checkout     β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Insight: The LLM plays both agent AND world model. Smarter search compensates for smaller models.

Python Pseudocode

import math

def rap(problem: str, n_iterations: int = 100, depth: int = 5) -> str:
    """RAP: Monte Carlo Tree Search with LLM as both agent and world model."""
    root = MCTSNode(state=problem)

    for _ in range(n_iterations):
        node = root

        # SELECT: traverse tree using UCB1
        while node.children and not node.is_terminal:
            node = max(node.children, key=lambda c: ucb1_score(c))

        # EXPAND: LLM as agent proposes actions
        actions = llm.generate(f"Possible next actions for:\n{node.state}")
        for action in parse_actions(actions):
            # LLM as world model predicts next state
            next_state = llm.generate(
                f"State: {node.state}\nAction: {action}\nPredict next state:"
            )
            child = MCTSNode(state=next_state, parent=node)
            node.children.append(child)

        # EVALUATE: LLM scores the state
        reward = llm.generate(f"Rate progress (0-1):\n{child.state}")

        # BACKPROPAGATE: update scores up the tree
        while node:
            node.visits += 1
            node.total_reward += float(reward)
            node = node.parent

    # Return best path
    return extract_best_path(root)

12. ADaPT: Adaptive Planning

Paper: ADaPT: As-Needed Decomposition and Planning - Prasad et al., NAACL 2024

Try first, decompose only when you fail. Simple tasks get done immediately. Complex tasks get recursively broken down.

Flow Diagram

              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  πŸ“₯ "Clean the kitchen"   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  ⚑ Try executing directly β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                     β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
                     β”‚ Success? β”‚
                     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                    Yes   β”‚   No
                     β”‚    β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”˜    └──────────────────┐
              β”‚                              β”‚
              β–Ό                              β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  βœ… Done!     β”‚          β”‚  πŸ“‹ Decompose into     β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚     subtasks           β”‚
                               β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
                                   β”‚      β”‚        β”‚
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚        └────────┐
                          β–Ό               β–Ό                 β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚ Wash dishesβ”‚  β”‚ Clean stoveβ”‚  β”‚Organize pantry β”‚
                   β”‚ ⚑ Try      β”‚  β”‚ ⚑ Try      β”‚  β”‚ ⚑ Try           β”‚
                   β”‚    β†’ βœ…    β”‚  β”‚    β†’ βœ…    β”‚  β”‚    β†’ ❌ Failed! β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β”‚
                                                           β–Ό
                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚  πŸ“‹ Decompose further β”‚
                                              β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                    β”‚          β”‚
                                                    β–Ό          β–Ό
                                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                             β”‚Sort cans β”‚ β”‚Sort boxesβ”‚
                                             β”‚  β†’ βœ…    β”‚ β”‚  β†’ βœ…    β”‚
                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Insight: Decomposition depth naturally matches task complexity. Only break down what actually fails.

Python Pseudocode

def adapt(task: str, executor, max_depth: int = 3, depth: int = 0) -> str:
    """ADaPT: Try first, decompose only on failure."""
    # Try executing the task directly
    result = executor.attempt(task)

    if result.success:
        return result.output

    if depth >= max_depth:
        return f"Failed after max decomposition depth: {task}"

    # Failed - decompose into subtasks
    subtasks = llm.generate(
        f"Task '{task}' failed. Break it into smaller subtasks:"
    )

    results = []
    for subtask in parse_subtasks(subtasks):
        # Recursively apply ADaPT to each subtask
        sub_result = adapt(subtask, executor, max_depth, depth + 1)
        results.append(sub_result)

    return combine_results(results)

Part 4: Advanced Topologies

13. Hierarchical Planning

Paper: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends - Shen et al., NeurIPS 2023

A powerful LLM acts as the "brain" - decomposing requests, selecting specialist models, and orchestrating execution. Like a CEO delegating to experts.

Flow Diagram

   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  πŸ“₯ "Describe this image and read aloud the text"     β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  🧠 Stage 1: TASK PLANNING                            β”‚
   β”‚                                                      β”‚
   β”‚  Subtask A: Image captioning                         β”‚
   β”‚  Subtask B: OCR text extraction                      β”‚
   β”‚  Subtask C: Text-to-speech                           β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  πŸ” Stage 2: MODEL SELECTION                          β”‚
   β”‚                                                      β”‚
   β”‚  A β†’ BLIP-2  (best caption model)                    β”‚
   β”‚  B β†’ TrOCR   (best OCR model)                        β”‚
   β”‚  C β†’ Bark    (best TTS model)                        β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚              β”‚               β”‚
           β–Ό              β–Ό               β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  BLIP-2      β”‚ β”‚  TrOCR    β”‚ β”‚  Bark TTS    β”‚
   β”‚  "A cat on   β”‚ β”‚  "print(  β”‚ β”‚  πŸ”Š audio.wav β”‚
   β”‚   a laptop"  β”‚ β”‚  hello    β”‚ β”‚              β”‚
   β”‚              β”‚ β”‚  world)"  β”‚ β”‚              β”‚
   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚               β”‚              β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  βœ… Stage 4: RESPONSE GENERATION                      β”‚
   β”‚                                                      β”‚
   β”‚  "The image shows a cat on a laptop.                 β”‚
   β”‚   The code says print('hello world'). [▢️ Audio]"     β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Insight: No single model can do everything well. Use an LLM as the orchestrator to delegate to specialist models.

Python Pseudocode

def hierarchical_planning(request: str, model_registry: dict) -> str:
    """Hierarchical Planning: LLM orchestrator delegates to specialist models."""
    # Stage 1: Task Planning - break into subtasks
    plan = orchestrator_llm.generate(
        f"Break this into subtasks with dependencies:\n{request}"
    )
    subtasks = parse_subtasks_with_deps(plan)

    # Stage 2: Model Selection - pick best model for each subtask
    assignments = {}
    for task in subtasks:
        selected = orchestrator_llm.generate(
            f"Which model best handles '{task.type}'?\n"
            f"Available: {list(model_registry.keys())}"
        )
        assignments[task.id] = model_registry[selected.strip()]

    # Stage 3: Execution - run each specialist model
    results = {}
    for task in topological_sort(subtasks):
        input_data = resolve_dependencies(task, results)
        results[task.id] = assignments[task.id].run(input_data)

    # Stage 4: Response Generation - combine all results
    return orchestrator_llm.generate(
        f"Original request: {request}\nResults: {results}\nGenerate final response:"
    )

14. Least-to-Most Prompting

Paper: Least-to-Most Prompting Enables Complex Reasoning - Zhou et al., Google, 2022

Break the hard problem into easy subproblems. Solve from easiest to hardest, feeding each answer into the next.

Flow Diagram

   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  πŸ“₯ "5 machines make 5 widgets in 5 min.              β”‚
   β”‚      How long for 100 machines to make 100?"          β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  πŸ“‹ DECOMPOSE             β”‚
              β”‚  (easiest β†’ hardest)     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Sub Q1 (easiest):       β”‚
              β”‚  "How long for 1 machine β”‚
              β”‚   to make 1 widget?"     β”‚
              β”‚                          β”‚
              β”‚  β†’ Answer: 5 minutes     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ answer feeds ↓
                             β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Sub Q2 (medium):        β”‚
              β”‚  "How long for 1 machine β”‚
              β”‚   to make 100 widgets?"  β”‚
              β”‚                          β”‚
              β”‚  β†’ Answer: 500 minutes   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ answer feeds ↓
                             β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Sub Q3 (hardest):       β”‚
              β”‚  "How long for 100       β”‚
              β”‚   machines to make 100?" β”‚
              β”‚                          β”‚
              β”‚  β†’ 100 machines parallel β”‚
              β”‚    each makes 1 widget   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  βœ… Answer: 5 min   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: The classic widget problem

Sub-question Reasoning Answer
Q1 (easy) Each machine makes 1 widget in 5 min 5 min
Q2 (medium) 1 widget = 5 min, so 100 = 500 min 500 min
Q3 (hard) 100 machines in parallel, each makes 1 5 min

Key Insight: Solving easiest sub-problems first builds a foundation of knowledge that makes harder sub-problems tractable.

Python Pseudocode

def least_to_most(question: str) -> str:
    """Least-to-Most: Decompose into sub-questions, solve easiest first."""
    # Stage 1: Decompose into sub-questions (easiest to hardest)
    decomposition = llm.generate(
        f"Break this into sub-questions from easiest to hardest:\n{question}"
    )
    sub_questions = parse_ordered_questions(decomposition)

    # Stage 2: Solve sequentially, feeding answers forward
    context = ""
    for sub_q in sub_questions:
        prompt = f"Context from previous answers:\n{context}\n\nQuestion: {sub_q}"
        answer = llm.generate(prompt)
        context += f"\nQ: {sub_q}\nA: {answer}\n"

    # The last answer addresses the original (hardest) question
    return answer

15. Algorithm of Thoughts (AoT)

Paper: Algorithm of Thoughts: Enhancing Exploration of Ideas in LLMs - Sel et al., 2023

Teach the LLM to simulate tree search internally using algorithmic examples in the prompt. Tree-like exploration in a single query.

Flow Diagram

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  πŸ“₯ Problem + "Explore like DFS algorithm"   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  🧠 SINGLE LLM CALL (internal simulation)    β”‚
        β”‚                                             β”‚
        β”‚  Path 1: Try approach A                     β”‚
        β”‚    β†’ hit contradiction                      β”‚
        β”‚    β†’ ❌ BACKTRACK                            β”‚
        β”‚                                             β”‚
        β”‚  Path 2: Try approach B                     β”‚
        β”‚    β†’ partial progress, keep going           β”‚
        β”‚    β”‚                                        β”‚
        β”‚    β”œβ”€ Path 2.1: Sub-approach B1             β”‚
        β”‚    β”‚  β†’ dead end                            β”‚
        β”‚    β”‚  β†’ ❌ BACKTRACK                         β”‚
        β”‚    β”‚                                        β”‚
        β”‚    └─ Path 2.2: Sub-approach B2             β”‚
        β”‚       β†’ works!                              β”‚
        β”‚       β†’ βœ… SUCCESS                           β”‚
        β”‚                                             β”‚
        β”‚  (Path 3: not needed, answer found)         β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  βœ… Answer from Path 2.2  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Solving a logic puzzle

Path Exploration Result
Path 1 Assume A is true β†’ B is both true AND false ❌ Contradiction, backtrack
Path 2.1 Assume A is false, B is false β†’ C must be both ❌ Dead end
Path 2.2 Assume A is false, B is true β†’ C is false βœ… Consistent!

Key Insight: By showing the LLM how an algorithm explores, it can simulate that exploration internally in a single call. Much cheaper than ToT.

Python Pseudocode

def algorithm_of_thoughts(problem: str, algorithm: str = "dfs") -> str:
    """AoT: Teach LLM to simulate tree search in a single call."""
    # Build a prompt that shows the LLM how to explore like an algorithm
    prompt = f"""Solve this problem by exploring like a {algorithm} algorithm.

Rules:
- Explore one path at a time
- If you hit a contradiction or dead end, explicitly BACKTRACK
- Try the next unexplored branch
- Continue until you find a consistent solution
- Show your exploration trace

Problem: {problem}

Exploration trace:
Path 1: """

    # Single LLM call does the entire tree search internally
    response = llm.generate(prompt, max_tokens=2000)

    # Extract the final answer from the exploration trace
    return extract_solution(response)

16. Graph of Thoughts (GoT)

Paper: Graph of Thoughts: Solving Elaborate Problems with LLMs - Besta et al., AAAI 2024

CoT = chain. ToT = tree. GoT = graph. Thoughts can have multiple parents (merging ideas), feedback loops, and arbitrary connections.

Flow Diagram

              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  πŸ“₯ "Sort [7,3,9,1,5,8,2]"   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                    Split into parts
                         β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β–Ό                         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  πŸ’­ Idea A        β”‚     β”‚  πŸ’­ Idea B        β”‚
  β”‚  Sort [7,3,9,1]  β”‚     β”‚  Sort [5,8,2]    β”‚
  β”‚  β†’ [1,3,7,9]     β”‚     β”‚  β†’ [2,5,8]       β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                         β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό  (2 parents - only graphs can do this!)
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚  πŸ”— AGGREGATE (merge)   β”‚ ◄──────┐
           β”‚  [1,2,3,5,7,8,9]      β”‚         β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
                        β”‚                     β”‚
                   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                β”‚
                   β”‚ Correct? β”‚                β”‚
                   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                β”‚
              Yes β”‚          β”‚ No             β”‚
                  β–Ό          └── πŸ” REFINE β”€β”€β”€β”˜
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚ βœ… Sorted list! β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The merge node has TWO parents (A and B). Trees can't do this - only graphs can combine ideas from different branches. Significant quality improvement over ToT with lower cost.

Python Pseudocode

from enum import Enum

class Operation(Enum):
    GENERATE = "generate"
    AGGREGATE = "aggregate"
    REFINE = "refine"
    SCORE = "score"

def graph_of_thoughts(problem: str, operations: list) -> str:
    """GoT: Process thoughts as a graph with merging, looping, and refining."""
    graph = ThoughtGraph()
    initial = graph.add_node(thought=problem)

    for op in operations:
        if op.type == Operation.GENERATE:
            # Split: create multiple child thoughts (like ToT)
            children = [llm.generate(f"Approach for: {op.input}") for _ in range(op.k)]
            for child in children:
                graph.add_node(thought=child, parents=[op.input_node])

        elif op.type == Operation.AGGREGATE:
            # Merge: combine multiple thoughts into one (unique to GoT!)
            combined = llm.generate(
                f"Combine these partial solutions:\n" +
                "\n".join(n.thought for n in op.input_nodes)
            )
            graph.add_node(thought=combined, parents=op.input_nodes)

        elif op.type == Operation.REFINE:
            # Loop back: improve an existing thought
            improved = llm.generate(f"Improve this:\n{op.input_node.thought}")
            graph.add_node(thought=improved, parents=[op.input_node])

        elif op.type == Operation.SCORE:
            # Evaluate: score a thought for quality
            op.input_node.score = llm.evaluate(op.input_node.thought)

    return graph.get_best_node().thought

17. AFlow: Automated Workflow Generation

Paper: AFlow: Automating Agentic Workflow Generation - Zhang et al., ICLR 2025 (Oral)

The meta-algorithm. Instead of humans choosing which algorithm to use, let an LLM automatically discover the optimal workflow using MCTS.

Flow Diagram

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  πŸ“₯ Task Dataset + Available Operators                β”‚
    β”‚  [Generate, Review, Ensemble, Test, Refine, ...]     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                 β–Ό                           β”‚
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
          β”‚  β”‚  1. SELECT                         β”‚     β”‚
          β”‚  β”‚  Pick promising workflow to modify β”‚     β”‚
          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
          β”‚                 β”‚                           β”‚
          β”‚                 β–Ό                           β”‚
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
          β”‚  β”‚  2. LLM MODIFIES workflow code     β”‚     β”‚
          β”‚  β”‚  "Add ensemble step after          β”‚     β”‚
          β”‚  β”‚   generation"                      β”‚     β”‚
          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
          β”‚                 β”‚                           β”‚
          β”‚                 β–Ό                           β”‚
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
          β”‚  β”‚  3. EVALUATE on validation set     β”‚     β”‚
          β”‚  β”‚  Score improves each iteration...   β”‚     β”‚
          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
          β”‚                 β”‚                           β”‚
          β”‚                 β–Ό                           β”‚
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
          β”‚  β”‚  4. BACKPROPAGATE score             β”‚     β”‚
          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
          β”‚                 β”‚                           β”‚
          β”‚            Converged? ───── No β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                 β”‚
          β”‚             Yes β”‚
          β”‚                 β–Ό
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          └──│  βœ… Optimal Workflow Discovered!     β”‚
             β”‚                                    β”‚
             β”‚  "Generate β†’ Self-Critique β†’       β”‚
             β”‚   Ensemble(3) β†’ Format"            β”‚
             β”‚                                    β”‚
             β”‚  GPT-4o-mini with this workflow    β”‚
             β”‚  Small model + smart workflow       β”‚
             β”‚  BEATS big model + naive workflow! β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example: Finding the best workflow for math problems

Iteration Workflow Tried Trend
1 "Generate answer directly" Baseline
2 "Generate + Review" Better
3 "Generate + Review + Ensemble(5)" Much better
10 "CoT + Self-Consistency(3) + Review + Refine" Strong
30 "Generate(temp=0.8) + Test + Reflect + Retry + Ensemble(3)" Best

Key Insight: A smaller model with a smart workflow can outperform a larger model with a naive workflow. Smarter workflows > bigger models.

Python Pseudocode

def aflow(task_dataset: list, operators: list, n_iterations: int = 30) -> dict:
    """AFlow: Use MCTS to automatically discover the best workflow."""
    # Start with a simple workflow
    root = WorkflowNode(workflow=["generate"])
    best_workflow = None
    best_score = 0

    for i in range(n_iterations):
        # 1. SELECT: pick a promising workflow to modify
        node = select_node(root, exploration_weight=1.4)

        # 2. EXPAND: LLM proposes a modification to the workflow
        modification = llm.generate(
            f"Current workflow: {node.workflow}\n"
            f"Available operators: {operators}\n"
            f"Suggest one improvement (add/remove/reorder a step):"
        )
        new_workflow = apply_modification(node.workflow, modification)
        child = WorkflowNode(workflow=new_workflow, parent=node)

        # 3. EVALUATE: run the new workflow on validation data
        score = evaluate_workflow(new_workflow, task_dataset)
        child.score = score

        if score > best_score:
            best_score = score
            best_workflow = new_workflow

        # 4. BACKPROPAGATE: update scores up the tree
        backpropagate(child, score)

    return {"workflow": best_workflow, "score": best_score}

Choosing the Right Algorithm

Scenario Recommended Algorithm
Simple Q&A with tools ReAct
Math/logic problems CoT or Self-Consistency CoT
Tasks requiring backtracking Tree of Thoughts
Multi-step agent tasks Plan-and-Execute or ReWOO
Independent parallel subtasks LLMCompiler
Code generation with retries Reflexion
Content quality improvement Self-Refine
Complex decomposition problems Least-to-Most
Multi-modal orchestration Hierarchical Planning
Cost-constrained production AFlow

Algorithm Composition Patterns

In practice, these algorithms are rarely used in isolation. The real power comes from combining them. Here are five proven composition patterns:

Pattern 1: ReAct + Reflexion (Learn-from-Failure Agent)

Use ReAct for action execution and Reflexion for learning from failures. When the ReAct loop fails, Reflexion writes a reflection and the agent retries with that memory.

  ReAct Loop ──► Failure ──► Reflexion (reflect + store)
       β–²                              β”‚
       └──────── Retry with memory β—„β”€β”€β”˜

Use case: Code generation agents that debug their own mistakes across attempts.

Pattern 2: CoT + Self-Consistency (Robust Reasoning)

Generate multiple CoT paths with high temperature, then majority-vote the answer. This is the simplest and most commonly used composition.

  Question ──► CoT Path 1 ──┐
           ──► CoT Path 2 ──┼──► Majority Vote ──► Answer
           ──► CoT Path 3 β”€β”€β”˜

Use case: Math problems, factual QA, anywhere correctness matters more than speed.

Pattern 3: Plan-and-Execute + LLMCompiler (Fast Parallel Agent)

Use Plan-and-Execute to create the plan, then hand it to LLMCompiler to parallelize independent steps.

  Task ──► Planner (GPT-4) ──► Dependency Graph ──► LLMCompiler
                                                       β”‚
                                              Parallel Execution
                                                       β”‚
                                                   ──► Result

Use case: Multi-tool agents that need to gather information from several APIs quickly.

Pattern 4: Hierarchical Planning + Self-Refine (Quality Orchestration)

Use Hierarchical Planning to delegate to specialist models, then Self-Refine to polish the combined output.

  Request ──► Orchestrator ──► Specialist A ──┐
                           ──► Specialist B ──┼──► Combine ──► Self-Refine Loop ──► Output
                           ──► Specialist C β”€β”€β”˜

Use case: Multi-modal tasks (image + text + audio) that need polished final output.

Pattern 5: ADaPT + Least-to-Most (Smart Decomposition)

Try the task directly (ADaPT). If it fails, decompose using Least-to-Most ordering (easiest subtask first).

  Task ──► Try directly (ADaPT)
              β”‚
         Success? ──Yes──► Done
              β”‚
             No
              β”‚
              β–Ό
         Decompose (Least-to-Most ordering)
         Solve easiest ──► ... ──► Solve hardest ──► Done

Use case: Complex multi-step tasks where difficulty is unknown upfront.

Getting Started: Practical Integration

Ready to use these algorithms in your own projects? Here's how to get started with two popular frameworks.

Using LangChain

LangChain has built-in support for several of these algorithms. Here's a quick ReAct agent:

from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool

# Define your tools
tools = [
    Tool(name="Search", func=search_func, description="Search the web"),
    Tool(name="Calculator", func=calc_func, description="Do math"),
]

# Create a ReAct agent
llm = ChatOpenAI(model="gpt-4")
agent = create_react_agent(llm, tools, react_prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run it
result = executor.invoke({"input": "What is the population of France divided by 3?"})

Using LangGraph (Plan-and-Execute, Reflexion, and more)

LangGraph enables more complex patterns like Plan-and-Execute and Reflexion through its graph-based workflow:

from langgraph.graph import StateGraph, END

# Define a Plan-and-Execute workflow
workflow = StateGraph(PlanExecuteState)

# Add nodes for planning and execution
workflow.add_node("planner", plan_step)       # GPT-4 creates the plan
workflow.add_node("executor", execute_step)   # GPT-3.5 executes each step
workflow.add_node("replan", replan_step)       # Re-plan if needed

# Define edges
workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges(
    "executor",
    should_replan,                            # Check if we need to adjust
    {"replan": "replan", "end": END}
)
workflow.add_edge("replan", "executor")

app = workflow.compile()
result = app.invoke({"input": "Compare weather in Tokyo and London"})

Using LlamaIndex (RAG + ReAct)

LlamaIndex combines retrieval with agentic reasoning:

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool

# Create tools from your data indices
tools = [
    QueryEngineTool.from_defaults(
        query_engine=docs_index.as_query_engine(),
        name="documentation",
        description="Search internal documentation"
    ),
]

# Create a ReAct agent with your tools
agent = ReActAgent.from_tools(
    tools,
    llm=OpenAI(model="gpt-4"),
    verbose=True
)

response = agent.chat("What does our API rate limiting policy say?")

Quick Reference: Which Framework for Which Algorithm?

Algorithm LangChain LangGraph LlamaIndex
ReAct create_react_agent Custom graph ReActAgent
Plan-and-Execute - PlanExecute template -
Reflexion - Custom with memory -
Self-Consistency Custom chain Parallel branches -
Tree of Thoughts - Custom BFS graph -

The Evolution: From Chains to Graphs to Automation

  2022                2023                2024              2025
   β”‚                   β”‚                   β”‚                 β”‚
   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”œβ”€β”€β–Ίβ”‚ πŸ”— CoT   β”‚    β”œβ”€β”€β–Ίβ”‚ 🌳 ToT    β”‚   β”œβ”€β–Ίβ”‚ πŸ•ΈοΈ GoT   β”‚  β”œβ”€β–Ίβ”‚ πŸ€– AFlow  β”‚
   β”‚   β”‚ (Linear  β”‚    β”‚   β”‚ (Tree +  β”‚   β”‚  β”‚ (Graph   β”‚  β”‚  β”‚ (Auto-   β”‚
   β”‚   β”‚  Chain)  β”‚    β”‚   β”‚ Backtrk) β”‚   β”‚  β”‚ Topology)β”‚  β”‚  β”‚ mated)   β”‚
   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚                   β”‚                   β”‚                 β”‚
   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                 β”‚
   β”œβ”€β”€β–Ίβ”‚ πŸ—³οΈ Self- β”‚    β”œβ”€β”€β–Ίβ”‚ πŸ€– ReAct  β”‚   β”‚                 β”‚
   β”‚   β”‚ Consist. β”‚    β”‚   β”‚ ReWOO    β”‚   β”‚                 β”‚
   β”‚   β”‚(Ensemble)β”‚    β”‚   β”‚ Plan&Exe β”‚   β”‚                 β”‚
   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚                 β”‚
   β”‚                   β”‚                   β”‚                 β”‚
   β”‚                   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                 β”‚
   β”‚                   β”œβ”€β”€β–Ίβ”‚ πŸͺž Reflex β”‚   β”‚                 β”‚
   β”‚                   β”‚   β”‚ Self-    β”‚   β”‚                 β”‚
   β”‚                   β”‚   β”‚ Refine   β”‚   β”‚                 β”‚
   β”‚                   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚                 β”‚
   β”‚                   β”‚                   β”‚                 β”‚

The trend is unmistakable: we're moving from hand-designed chains to automatically discovered, arbitrarily structured reasoning workflows. The future belongs to systems that dynamically choose and combine these algorithms based on the task at hand.

References

# Paper Code
1 Yao et al. (2022). ReAct: Synergizing Reasoning and Acting. ICLR 2023. GitHub
2 Wei et al. (2022). Chain-of-Thought Prompting Elicits Reasoning. NeurIPS 2022. -
3 Wang et al. (2022). Self-Consistency Improves Chain of Thought Reasoning. -
4 Yao et al. (2023). Tree of Thoughts: Deliberate Problem Solving. NeurIPS 2023. GitHub
5 Zhou et al. (2023). Language Agent Tree Search (LATS). ICML 2024. GitHub
6 Wang et al. (2023). Plan-and-Solve Prompting. ACL 2023. GitHub
7 Xu et al. (2023). ReWOO: Decoupling Reasoning from Observations. GitHub
8 Kim et al. (2023). An LLM Compiler for Parallel Function Calling. ICML 2024. GitHub
9 Shinn et al. (2023). Reflexion: Verbal Reinforcement Learning. NeurIPS 2023. GitHub
10 Madaan et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. GitHub
11 Hao et al. (2023). Reasoning with Language Model is Planning. EMNLP 2023. GitHub
12 Prasad et al. (2023). ADaPT: As-Needed Decomposition and Planning. NAACL 2024. GitHub
13 Shen et al. (2023). HuggingGPT: Solving AI Tasks with ChatGPT. NeurIPS 2023. GitHub
14 Zhou et al. (2022). Least-to-Most Prompting Enables Complex Reasoning. -
15 Sel et al. (2023). Algorithm of Thoughts: Enhancing Exploration. GitHub
16 Besta et al. (2023). Graph of Thoughts: Solving Elaborate Problems. AAAI 2024. GitHub
17 Zhang et al. (2024). AFlow: Automating Agentic Workflow Generation. ICLR 2025. GitHub

Comments (0)

?

Leave a comment