Posts

Showing posts with the label arXiv

Agent Safety and Reliability: Three Recent arXiv Papers on Pre-Deployment Verification, Intervention Timing, and Long-Horizon Error Tracking

Three Recent arXiv Papers on LLM Agent Safety and Reliability: Guardrails, Hallucination Mitigation, and Self-Improvement Evaluation

Why LLM Agent Evaluation Is Hard: Recent Papers on the Gap Between Benchmarks and Real Deployment

Two Axes for Reading LLM Agent Design: What the Agent Does and How It Runs

LLM Agents and Scientific Discovery: What Four New arXiv Papers Suggest About the Next Wave of Automation

Three Recent Papers on Making LLM Agents More Stable in Planning and Reasoning