One of the biggest drivers of LLM API costs is Input Tokens. When users have long conversations—especially those containing high-resolution images or large files—sending the entire 50-message history on every request wastes money and slows down response times (TTFT). LLM Router’s Context Optimization solves this by analyzing the user’s latest message against the conversation history. If the user changes topics, or if previous heavy media is no longer relevant to the current question, we automatically prune the context before sending it to the model provider.
1. How Chat Optimization Works
When a request arrives, our internal Gateway AI generates achat_score (from 0.0 to 1.0). This score represents how heavily the user’s current message relies on the past messages.
- Score 1.0 (High Dependency): “Fix the error in the second file you sent.” (Needs full history).
- Score 0.1 (Low Dependency): “Completely new topic: Write a Haiku.” (Needs zero history).
chatHistoryOptimization.score setting.
If the internal chat_score is LESS than your configured threshold, LLM Router activates the optimization engine to strip out old, irrelevant messages and compress long text blocks.
2. How Media Optimization Works
Multimodal inputs (like sending a UI mockup to Claude 3.5 Sonnet) are incredibly expensive. Often, a user will upload an image early in a chat, but later ask a purely text-based question. If you enablemediaOptimization, the router detects if the current prompt actually requires the past images to answer. If it doesn’t, LLM Router strips the heavy image binaries out of the history array entirely, saving you massive amounts of “Vision” tokens.
Configuration
You configure these behaviors inside thegateway object.
TypeScript
Visualizing Chat & Media Optimization
To understand how much money this saves, let’s look at real-world examples of how LLM Router transforms yourmessages array before sending it to the expensive upstream model.
Scenario A: The Multi-Modal Topic Shift (Media Stripping)
- The Setup: A user uploads a high-res UI mockup (
~2,000tokens). - The Shift: After 2 turns, the user asks, “Build a landing page for a car rental business called…”
- The Analysis: LLM Router calculates a
chat_scoreof0.1(Total Topic Change) and determines the image is no longer needed.
Scenario B: The Text Topic Shift
- The Setup: A user pastes a massive 5,000-line error log to debug a Python script.
- The Shift: After five turns, the user says, “Build a landing page for a car rental business called…”
Scenario C: The Long Conversation
If thechat_score is borderline (e.g., 0.5), indicating the user is still on the same topic but the old messages are getting too long, LLM Router performs Middle-Out Compression on the older text blocks.
Before: