Posts

When Does LLM Self-Correction Actually Help? Papers on Iterative Refinement, Evaluation, and Reliability

AI Agents in Practice: Workflow Integration and Real-World Use Cases

How LLM Agents Combine Decision-Making and Skill Use in Long-Horizon Tasks

Tool Choice and Interpretability in LLM Agents: Key Ideas from Three Recent Papers

Why LLM Agents Still Struggle With Scientific Reasoning: Limits and Responses From Recent Papers

Is LLM Reasoning Really a Chain of Thought? What a New Paper Questions

Rethinking LLM Reasoning as Internal State Change, Not Visible Chain-of-Thought

Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits

Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers

Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation

How LLM Agents Handle Real Work and Exploration Problems: Four Recent Papers in Brief