Reasoning vs instruct models: when AI should think

This is one of the least obvious yet most cost-intensive findings when working with modern models. “Reasoning” sounds better than “instruct”—but when chosen without purpose, it can be slower, more expensive, and… emptier.

What is a model’s “reasoning”#

A reasoning model generates an internal chain of thought before responding—breaking down the problem and evaluating options. This chain is hidden (never shown to the user) and consumes tokens and time. In return, you get a more accurate answer to difficult questions.

An instruct model responds directly: no hidden reasoning, faster, and cheaper. For conversation, translations, code, and summaries, this is the right choice.

The trap: empty response#

The key warning from our measurements: a reasoning model forced into a regular chat can burn the entire token budget on reasoning and return empty content. That’s why we enable reasoning mode (the think parameter) only for tasks that truly need it—and keep it off for everything else.

When to use which#

Criterion	Reasoning (thinking)	Instruct (non-reasoning)
Response speed	slow	fast
Cost	high	low
Accuracy for tough decisions	high	medium
Risk of empty response in chat	high	none
Best for	analysis, planning, agents	conversation, code, translations, summaries
When to enable	only when task requires reasoning	default

The scale of the difference is approximate, but worth knowing. Reasoning mode can add anywhere from a few hundred to a few thousand hidden tokens per hard question—which translates into higher cost and a longer time to first token (TTFT). The exact values depend on the model and the task, so treat these numbers as an order of magnitude, not a benchmark: every deployment needs its own measurements. That’s all the more reason not to keep reasoning on by default for conversation, code, or summaries.

In practice, you don’t set this manually. The OpenClaw router enables reasoning automatically only for tasks labeled as “reasoning” (e.g., complex analysis, planning steps for an agent), while keeping it off for conversation, code, and summaries—faster, cheaper, and with content guaranteed.

Which model for reasoning#

Our default reasoning engine is DeepSeek-V4—reliable, with a context window up to 1M tokens. For regular conversation, we use Mistral Large 3 (instruct), and for summaries, Gemma 3. The full logic for matching a model to a specific task is described in our article on choosing the right LLM for the task.

Try it live#

Run a model in reasoning mode via our secure sandbox (playground: PII masked, zero retention)—ask a question requiring analysis.

▶Ask a question requiring reasoningsandbox · reasoning

FAQ#

Is a reasoning model better than instruct?#

Not “better overall”—better for different tasks. Reasoning wins for tough decisions and analysis. Instruct wins for conversation, code, and summaries: faster, cheaper, and no risk of empty responses. Match the model to the task, don’t pick one “forever.”

Why does a model sometimes return an empty response?#

Because it’s a reasoning model running in thinking mode for a task that doesn’t need it—all token budget went to hidden reasoning, leaving nothing for the actual content. Solution: disable reasoning mode for simple tasks (the router does this for you).

How does the router know when to enable reasoning?#

By task type. For tasks labeled as “reasoning” (complex analysis, planning), it enables thinking mode; for conversation, translations, code, and summaries, it keeps it off. You can also override this explicitly.