Posts

Showing posts with the label tool use

How Can We Make LLM Agents More Reliable in Memory and Tool Use?

Designing Safer LLM Agents: Key Issues from Recent Papers

Three AI News Updates on Safer Agents, Multi-Turn Tool Use, and Infrastructure Scale

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure

Why Do Long-Horizon Agents Break? Diagnosing Failure with HORIZON and Related Papers

Why Do Long-Horizon Agents Break? HORIZON and the Case for Diagnostic Evaluation