Fifty-four years ago, researchers were already trying to pin down what it means for a human to reason. Wason published early work on the topic in 1968, followed by Wason and Johnson-Laird in 1972. Today, that same question sits at the center of one of the most debated topics in artificial intelligence: can large language models actually reason, or do they just pattern-match really well?
Defining Reasoning for LLMs
Before asking whether an LLM can reason, you have to agree on what reasoning is. A DigitalOcean Community Tutorial, updated in May 2025, overviews a survey paper titled 'Towards Reasoning in Large Language Models: A Survey' by Adrien Payong and Shaoni Mukherjee. The tutorial pulls together definitions from multiple academic sources spanning from 1968 to 2018, including work by Galotti (1989), Fagin et al. (2004), and McHugh and Way (2018).
The core definition they land on is straightforward. Reasoning is 'the act of thinking about something logically and systematically to draw a conclusion or make a decision.' That definition breaks down into identifiable components: inference, evaluation of arguments, and logical conclusion-drawing.
Each of those components maps to a different cognitive task. And each one poses a different challenge for a language model.
The Reasoning Taxonomy: Four Key Types
The survey overview organizes reasoning into distinct categories. Understanding the differences matters because each type demands something different from an AI system.
Deductive reasoning is the most rigid. If your premises are true, your conclusion must also be true. There is no wiggle room. Think of a mathematical proof or a syllogism. An LLM either gets this right or it does not.
Inductive reasoning is softer. The conclusion is probable based on evidence but not guaranteed. This is the kind of reasoning you use when you notice a pattern and generalize from it. Most of what we call 'learning from experience' falls here.
Abductive and Analogical Reasoning
Abductive reasoning takes a different angle. Instead of moving from premises to conclusions, you start with observations and work backward to find the most plausible explanation. Doctors use this constantly when diagnosing symptoms. It is inherently uncertain, which makes it particularly tricky to evaluate in AI systems.
Analogical reasoning draws comparisons between two or more things to make inferences or reach conclusions. You see a new problem, recognize it shares structure with a problem you already understand, and transfer that knowledge over. This is where LLMs show some of their most interesting behavior, since their entire training is built on recognizing structural patterns across vast datasets.
What This Framework Actually Tells Us
Here is the honest truth. The DigitalOcean tutorial provides a useful taxonomy, but it does not tell us how well current models perform on any of these categories. The source material covers definitions and categorization, not benchmark results or chain-of-thought evaluations.
What we can say is this: the research community takes the categorization seriously. You cannot measure reasoning in LLMs without first defining what you are measuring. And lumping all reasoning into one bucket hides more than it reveals. A model might handle deductive tasks cleanly while failing completely on abductive ones.
The deeper technical questions about transformer architectures, multi-step inference mechanisms, and reasoning benchmarks remain outside the scope of what these sources cover. The tutorial does not specify exact figures for model performance on any reasoning type.
So the next time someone tells you an LLM 'can reason,' ask them which type they mean. The answer matters more than the claim. What type of reasoning do you think current language models handle best, and where do you see them falling short?
Comments