Posts

Showing posts with the label LLM agents

LLM Agents and Scientific Discovery: What Four New arXiv Papers Suggest About the Next Wave of Automation

DreamProver and AGEL-Comp: What LLM Agents Need to Reason Better and Generalize Further

Three Recent Papers on Making LLM Agents More Stable in Planning and Reasoning

Two Ways to Stabilize LLM Agents on Complex Tasks: Hierarchical Planning and CAP-CoT

How LLM Agents Combine Decision-Making and Skill Use in Long-Horizon Tasks

Tool Choice and Interpretability in LLM Agents: Key Ideas from Three Recent Papers

Why LLM Agents Still Struggle With Scientific Reasoning: Limits and Responses From Recent Papers

Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits

Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers

Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation

How LLM Agents Handle Real Work and Exploration Problems: Four Recent Papers in Brief