Posts

Showing posts from May, 2026

Three Recent AI Agent News Items: OpenAI, AWS, and Virgin Atlantic

Rethinking LLM Agent Evaluation: The New Criteria Proposed by AgentAtlas

What Data Shapes LLM Performance? Why This Paper Proposes Data Probes

Three Recent AI Papers on Agents, Documents, and Data: What Has Changed for Real-World LLM Systems?

Recent Papers on LLM Agents: Memory, Negotiation, and Structural Failure

Three Recent Papers on Making LLM Agent Execution More Reliable: SDOF, SkillSmith, and STAR

Two Axes for Reading LLM Agent Design: What the Agent Does and How It Runs

Designing Safer LLM Agents: Key Issues from Recent Papers

Why LLMs Lose Context in Multi-Turn Interaction: What Three New Papers Suggest About Causes and Responses

Three AI News Updates on Safer Agents, Multi-Turn Tool Use, and Infrastructure Scale

How Conversational LLM Agents Choose the Next Question: BALAR and PRISM

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure