Posts

Showing posts with the label tool use

Designing Safer LLM Agents: Key Issues from Recent Papers

Three AI News Updates on Safer Agents, Multi-Turn Tool Use, and Infrastructure Scale

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure

Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers

Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation