Skip to main content

Posts

Featured

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure

Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure CreativityBench is an arXiv paper that introduces a benchmark for evaluating creative reasoning in large language model agents. The paper frames creative problem-solving in a specific way: not as open-ended originality in general, but as the ability to repurpose available tools or objects by reasoning about their affordances and attributes rather than their usual, canonical use. [S1] [S1] intro: What is CreativityBench? The paper is titled "CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing" and was released on arXiv. In the authors' framing, the benchmark is a first step toward evaluating whether an LLM-based agent can solve problems creatively by using tools in non-standard ways. Rather than asking only whether a model reaches the right answer, the benchmark is designed to examine a narrower question: can the model look at an available object, infer what pro...

Latest Posts

Why Safety in LLM Agents May Depend More on Interaction Topology Than on the Model

When Do Tools Help LLM Agents, and When Do They Backfire?

Why Does LLM Diversity Shrink? Reconsidering Generative Diversity After Supervised Fine-Tuning

AWS and NVIDIA Show Two AI Trends: Better LLM Evaluation and Wider Agent Adoption

LLM Agents and Scientific Discovery: What Four New arXiv Papers Suggest About the Next Wave of Automation

DreamProver and AGEL-Comp: What LLM Agents Need to Reason Better and Generalize Further

Three Recent Papers on Making LLM Agents More Stable in Planning and Reasoning

Two Ways to Stabilize LLM Agents on Complex Tasks: Hierarchical Planning and CAP-CoT

When Does LLM Self-Correction Actually Help? Papers on Iterative Refinement, Evaluation, and Reliability

AI Agents in Practice: Workflow Integration and Real-World Use Cases