Posts
When Does LLM Self-Correction Actually Help? Papers on Iterative Refinement, Evaluation, and Reliability
- Get link
- X
- Other Apps
AI Agents in Practice: Workflow Integration and Real-World Use Cases
- Get link
- X
- Other Apps
How LLM Agents Combine Decision-Making and Skill Use in Long-Horizon Tasks
- Get link
- X
- Other Apps
Tool Choice and Interpretability in LLM Agents: Key Ideas from Three Recent Papers
- Get link
- X
- Other Apps
Why LLM Agents Still Struggle With Scientific Reasoning: Limits and Responses From Recent Papers
- Get link
- X
- Other Apps
Is LLM Reasoning Really a Chain of Thought? What a New Paper Questions
- Get link
- X
- Other Apps
Rethinking LLM Reasoning as Internal State Change, Not Visible Chain-of-Thought
- Get link
- X
- Other Apps
Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits
- Get link
- X
- Other Apps
Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers
- Get link
- X
- Other Apps
Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation
- Get link
- X
- Other Apps
How LLM Agents Handle Real Work and Exploration Problems: Four Recent Papers in Brief
- Get link
- X
- Other Apps