Skip to main content

Posts

Featured

Why Do LLM Agent Memories Keep Failing? Three Recent Papers on the Core Problems

Why Do LLM Agent Memories Keep Failing? Three Recent Papers on the Core Problems Three recent papers look at the same broad problem from different angles: long-term memory in AI agents. "Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory" argues that persistent agent memory is often treated too narrowly as storage, even though long-running agents need memory for learning across sessions, reducing repeated context injection, and auditing past decisions. "MemFail: Stress-Testing Failure Modes of LLM Memory Systems" focuses on how current evaluations often hide where memory systems actually break. "Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions" connects memory directly to personalized assistance, especially when a user’s intent is only implicit in prior interactions. Taken together, all three papers are directly about long-term memory or long-term interaction, and they sugges...

Latest Posts

What Determines the Performance of LLM Agent Workflows? Balancing Latency, Reliability, and Cost

Why LLM Agent Evaluation Is Hard: Recent Papers on the Gap Between Benchmarks and Real Deployment

Three Recent AI Agent News Items: OpenAI, AWS, and Virgin Atlantic

Rethinking LLM Agent Evaluation: The New Criteria Proposed by AgentAtlas

What Data Shapes LLM Performance? Why This Paper Proposes Data Probes

Three Recent AI Papers on Agents, Documents, and Data: What Has Changed for Real-World LLM Systems?

Recent Papers on LLM Agents: Memory, Negotiation, and Structural Failure

Three Recent Papers on Making LLM Agent Execution More Reliable: SDOF, SkillSmith, and STAR

Two Axes for Reading LLM Agent Design: What the Agent Does and How It Runs

Designing Safer LLM Agents: Key Issues from Recent Papers