Skip to main content

Posts

Featured

Three Recent Papers on Making LLM Agent Execution More Reliable: SDOF, SkillSmith, and STAR

Three Recent Papers on Making LLM Agent Execution More Reliable: SDOF, SkillSmith, and STAR Three recent papers approach a similar problem from different angles: how to make LLM-based agent execution more reliable when tasks unfold across multiple steps, tools, or agents. SDOF presents multi-agent orchestration as a constrained state machine for business-like process control, SkillSmith reframes agent skills as compiled runtime interfaces to reduce waste and improve execution discipline, and STAR focuses on repairing failures in stage-based root cause analysis agents for microservices. Taken together, they reflect a broader research shift from letting agents improvise freely toward giving execution flows clearer control boundaries and recovery paths. [S1][S3][S11] [S1] [S3] [S11] What these papers are about Each paper starts from a concrete reliability problem in agent execution. SDOF, titled "Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch...

Latest Posts

Two Axes for Reading LLM Agent Design: What the Agent Does and How It Runs

Designing Safer LLM Agents: Key Issues from Recent Papers

Why LLMs Lose Context in Multi-Turn Interaction: What Three New Papers Suggest About Causes and Responses

Three AI News Updates on Safer Agents, Multi-Turn Tool Use, and Infrastructure Scale

How Conversational LLM Agents Choose the Next Question: BALAR and PRISM

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure

Why Safety in LLM Agents May Depend More on Interaction Topology Than on the Model

When Do Tools Help LLM Agents, and When Do They Backfire?

Why Does LLM Diversity Shrink? Reconsidering Generative Diversity After Supervised Fine-Tuning

AWS and NVIDIA Show Two AI Trends: Better LLM Evaluation and Wider Agent Adoption