Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits
Why LLM Agents Stay Unstable: Three Recent arXiv Papers on Reliability, Web Skill Learning, and Reasoning Limits Three recent arXiv papers look at a similar problem from different angles: LLM agents can appear capable, yet still behave unpredictably, struggle on long web workflows, or degrade during multi-step reasoning. “Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models” examines instability at the numerical level, “WebXSkill: Skill Learning for Autonomous Web Agents” focuses on how web agents learn reusable skills for long-horizon tasks, and “The cognitive companion: a lightweight parallel monitoring architecture for detecting and recovering from reasoning degradation in LLM agents” studies how to detect and recover from agent failures while they are running. Taken together, these papers shift attention from raw capability to reliability in actual agent use. [S1][S2][S7] [S1] [S2] [S7] Introduction: the papers and their arXiv context All th...