Skip to main content

Posts

Featured

Three Recent Papers on LLM Agents: Memory, Workflow Verification, and Skill Creation

Three Recent Papers on LLM Agents: Memory, Workflow Verification, and Skill Creation Three recent arXiv papers point to a shared question in LLM agent research: how to make long, multi-step work more reliable. Lean4Agent, AdMem, and Workflow-to-Skill were all posted on arXiv in June 2026, and each focuses on a different bottleneck: formally specifying and verifying workflows and execution trajectories, building memory that supports long-horizon task solving, and constructing reusable skills from heterogeneous interaction traces. Taken together, they offer a useful way to think about agent reliability through three separate but related layers: workflow, memory, and skill. [S3][S4][S6] [S3] [S4] [S6] Introduction: what these papers are and when they appeared The three papers are Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory, AdMem: Advanced Memory for Task-solving Agents, and Workflow-to-Skill: Skill Creation via Routing-Workflow-Semantics-Attachments ...

Latest Posts

Safety, Efficiency, and Real-World Use of LLM Agents: Reading Four Recent arXiv Papers

Pre-Deployment Checks and Runtime Safety for AI Agents: Three Recent arXiv Papers

Agent Safety and Reliability: Three Recent arXiv Papers on Pre-Deployment Verification, Intervention Timing, and Long-Horizon Error Tracking

Three New Papers on LLM Memory and Reasoning: ChatHealthAI, Traj-Evolve, and DELTAMEM

Why Don’t LLM Agents Act as They Explain? The Faithfulness Gap in 3 Recent Papers

What Changed in Physics-Aware Diagram Generation and Physical Reasoning Benchmarks?

LLM Serving Observability and Tuning Points: SageMaker AI and NVIDIA DynoSim

4 AWS and NVIDIA AI Operations and Deployment Updates for Practitioners

Three Recent arXiv Papers on LLM Agent Safety and Reliability: Guardrails, Hallucination Mitigation, and Self-Improvement Evaluation

Four Recent Papers on Reliable LLM Agents: Verification, Runtime Policy, Memory, and Privacy