Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure
Can LLMs Reuse Tools Creatively? What CreativityBench Tries to Measure CreativityBench is an arXiv paper that introduces a benchmark for evaluating creative reasoning in large language model agents. The paper frames creative problem-solving in a specific way: not as open-ended originality in general, but as the ability to repurpose available tools or objects by reasoning about their affordances and attributes rather than their usual, canonical use. [S1] [S1] intro: What is CreativityBench? The paper is titled "CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing" and was released on arXiv. In the authors' framing, the benchmark is a first step toward evaluating whether an LLM-based agent can solve problems creatively by using tools in non-standard ways. Rather than asking only whether a model reaches the right answer, the benchmark is designed to examine a narrower question: can the model look at an available object, infer what pro...