Skip to main content

Posts

Featured

Two Ways to Stabilize LLM Agents on Complex Tasks: Hierarchical Planning and CAP-CoT

Two Ways to Stabilize LLM Agents on Complex Tasks: Hierarchical Planning and CAP-CoT Two recent papers examine a similar weakness in LLM-based agents from different angles. “From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents” focuses on planning in dynamic, multi-step tasks and argues that many current agents rely on plans with a fixed level of detail, which can be too coarse for hard tasks and too detailed for simple ones. “CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning” looks at reasoning instead, starting from the observation that Chain-of-Thought prompting can become unstable on long, multi-step problems and may produce inconsistent answers even when the task does not change. Together, the papers ask how LLM agents can become more stable when both planning and reasoning stretch across many steps. [S9][S11] [S9] [S11] What the papers are about The first paper, “From Coarse to Fine: Self-Adaptive Hierarchical Planning for LL...

Latest Posts

When Does LLM Self-Correction Actually Help? Papers on Iterative Refinement, Evaluation, and Reliability

AI Agents in Practice: Workflow Integration and Real-World Use Cases

How LLM Agents Combine Decision-Making and Skill Use in Long-Horizon Tasks

Tool Choice and Interpretability in LLM Agents: Key Ideas from Three Recent Papers

Why LLM Agents Still Struggle With Scientific Reasoning: Limits and Responses From Recent Papers

Is LLM Reasoning Really a Chain of Thought? What a New Paper Questions

Rethinking LLM Reasoning as Internal State Change, Not Visible Chain-of-Thought

Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits

Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers

Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation