Why Don’t LLM Agents Act as They Explain? The Faithfulness Gap in 3 Recent Papers
Why Don’t LLM Agents Act as They Explain? The Faithfulness Gap in 3 Recent Papers Three recent arXiv papers look at a related reliability problem from different angles: whether LLM agents act on the reasoning they state, how safety signals can be tracked across long action trajectories, and whether hidden reasoning traces can still be exposed. “Doing What They Say, Not What They Reason” introduces the faithfulness gap in a controlled Texas Poker simulator with a verifiable reference action for every decision. “TRACE” reframes long-horizon agent safety as trajectory-level evidence compression. “Hidden Thoughts Are Not Secret” examines reasoning trace exposure in systems that show users only summaries and final answers. Taken together, these papers suggest that agent reliability is not just about output quality, but also about the relationship between internal reasoning, visible explanations, and actual behavior over time. [S5][S8][S9] [S5] [S8] [S9] What these papers are about All th...