Large Language Models often fail at complex reasoning tasks (like intricate math, deep code refactoring, or multi-step logic puzzles) because they try to generate the final answer immediately. LLM Router solves this with Advanced Planning Routing. By chaining two models together, you can force a “Planner Model” to generate a strict, step-by-step strategy before an “Execution Model” writes the final response.
How Planning Works
When a request arrives, LLM Router analyzes the prompt and generates a Complexity Score (from 0.0 to 1.0). If this score exceeds your configuredplanningTriggerScore, LLM Router intercepts the request and performs a two-step process:
- The Planning Phase: It sends the prompt and conversation history to your designated Planner Model. This model is instructed via a strict System Prompt to only generate a logical execution plan, not the final answer.
- The Execution Phase: The resulting plan is injected directly into the system prompt of your designated Execution Model, which then generates the final output exactly as the user requested, guided by the flawless strategy.
Defining the Model Chain
You define this Planner/Executor relationship using a specific syntax in yourmodel request or within your gateway.tags configuration.
Syntax: executorModel:planning:plannerModel
For example, if you want claude-3-5-sonnet to execute the code, but you want o1-mini (a specialized reasoning model) to plan it:
anthropic/claude-3-5-sonnet:planning:openai/o1-mini
Configuration
You set the complexity threshold that triggers this behavior inside thegateway object.
TypeScript
Using Planning with Tags
You can also use this syntax directly inside yourgateway.tags arrays to create incredibly smart, intent-based routing networks.
Configuration Properties
The gateway Object
| Property | Type | Default | Description |
|---|---|---|---|
planningTriggerScore | number | 0.6 | The complexity threshold (0.0 to 1.0). If the internal request score is greater than > this number, the two-step planning chain is executed. If it is lower, the executorModel is called normally, bypassing the planner to save latency and costs. |
Cost & Latency Considerations: Triggering a planning phase means you are
making two LLM API calls instead of one. Set your
planningTriggerScore
high enough (e.g., 0.7 or 0.8) so that simple requests don’t waste time
and money generating unnecessary plans.