The Developer-Router: Right Model, Right Time

For a long time, choosing an AI model for development work felt almost too easy. You picked the best one available, gave it the task, and let it run. If a model was smarter, why bother thinking too much about it? The answer was obvious enough: use the best tool, get the best result, move on with your day.

That habit made sense while the cost difference was small enough to ignore, or at least small enough to feel abstract. But as frontier models become more expensive, model choice stops being a simple quality ranking. It becomes a routing decision. You are no longer asking only which model is best. You are asking which model is appropriate for this task, at this moment, with this level of risk.

I wrote about the pricing side of this shift in GPT-5.5: When Frontier Models Become Expensive. This article is the follow-up: what changes in the way developers actually use models once the pricing and capability gaps become visible.

This is not only my own experience. It is also what I hear more and more from developers I talk to every day. People are no longer comparing models as if there were one universal winner. They are starting to know which model is good at planning, which one is good at fast edits, which one is annoying with long context, which one is reliable for tests, which one is too expensive to leave running unattended, and which one is surprisingly good for boring execution work.

That is the developer-router: the developer who does not send every task to the same model, but routes work depending on cost, difficulty, uncertainty, and consequence.

The old default: use the best model

The old default was comfortable. A new model came out, everyone said it was better, and the natural move was to use it for everything: planning, coding, testing, debugging, refactoring, documentation, review, and whatever else happened inside the agent loop. It was not always efficient, but it was simple, and simple defaults are hard to beat when they mostly work.

There was also a psychological reason for it. When you are working with AI, you do not want to wonder whether the failure came from the model being too weak. If the best model makes a mistake, at least you know you tried the strongest option. If a cheaper model makes a mistake, you immediately wonder whether you saved a few cents and wasted an hour. So the instinct to pick the best model is not stupid. It is understandable.

But it also treats all development tasks as if they require the same kind of intelligence, and they do not. Understanding a legacy architecture is not the same as renaming a variable. Planning a risky migration is not the same as writing a few obvious unit tests. Debugging a strange production issue is not the same as updating a README after the decision has already been made.

The new question

The useful question now is simple: does this task really need the most expensive model?

Sometimes the answer is yes. If I am asking an AI agent to understand a complex codebase, identify hidden coupling, evaluate two competing technical approaches, or plan a refactor that could break production, I want the strongest model I can reasonably use. Those are thinking-heavy tasks. They require judgment, context, caution, and the ability to notice consequences that are not written explicitly in the prompt.

But many development tasks are not like that. Once the plan is clear, a lot of the work becomes execution: edit these files, apply this pattern, write tests for these cases, update these docs, rename this concept everywhere, fix the formatting, follow the checklist. Those tasks still need competence, but they do not always need the most expensive reasoning engine available.

This is where the routing habit begins. The point is not to be cheap for the sake of being cheap. Price matters, but it is not the only variable. Developers are also starting to arbitrate based on model personality, reliability, speed, context handling, coding style, and failure modes. The point is to stop pretending every step of the work has the same value, the same risk, and the same model requirements.

The developer becomes a router

The developer-router keeps frontier models for thinking-heavy work and uses cheaper near-frontier models for execution-heavy work. That sounds more formal than it really is. In practice, it can be as simple as using the best model to understand the problem and design the plan, then switching to a cheaper model to apply the plan once the shape of the solution is obvious.

This is not a downgrade in how we work with AI. It is closer to how we already work with people and tools. You do not ask the senior architect to manually apply the same import change across twenty files. You do not need a principal engineer to write the tenth variation of an already-defined test. You use the highest judgment where judgment matters, and you use reliable execution where execution is enough.

AI models are becoming similar. The strongest model is not always the best choice, because “best” depends on the job. Sometimes best means smartest. Sometimes best means cheapest that will reliably do the task. Sometimes best means fastest. Sometimes best means available in the tool you already use.

Frontier and near-frontier models

I use “frontier model” to mean the best models available at a given moment. They are usually the most capable models, especially when the task is ambiguous, complex, or judgment-heavy. They are also usually the most expensive, and that price is part of the decision.

Near-frontier models are not the absolute best models in the world, but they are close enough for a large number of practical tasks. They may be slightly weaker on hard reasoning, long-horizon planning, or subtle architectural judgment, but still perfectly capable of writing tests, applying a refactor, summarizing files, generating boilerplate, or handling a small local fix.

For many programming tasks, you do not need the best model in the world. You need a model that is good enough, reliable enough, fast enough, and cheap enough. That last part used to feel less important, but agent workflows make it very real because small costs compound quickly when the model is reading files, editing code, validating output, and looping for a long time.

A practical workflow

The pattern I am starting to like is simple: plan with a frontier model, execute with a near-frontier model, and bring the frontier model back when the work becomes hard again.

Planning is where the expensive model earns its keep. This is the moment where you want the model to read the architecture, understand the problem, identify risks, propose a plan, and help choose the right technical approach. If the task is ambiguous, this is not where I want to save money. A bad plan can make the whole execution phase cheaper in tokens and more expensive in time, which is a terrible trade.

Execution is different. Once the plan is clear, the work often becomes concrete enough to hand to a cheaper model: edit these files, write these tests, apply this naming convention, update this page, implement this already-decided change. The near-frontier model still has to be supervised, but the risk is lower because the direction is no longer open-ended.

Then, when the task becomes uncertain again, the frontier model comes back. Maybe the tests fail in a strange way. Maybe the refactor exposes a design problem. Maybe there are two possible fixes and both have trade-offs. Maybe the agent changed more than expected and the diff needs a careful review. Those are good moments to pay for judgment again.

The important part is that the expensive model stays in the loop, but not in every loop.

What this changes in practice

This is not only about saving money. Cost is the trigger, but the deeper effect is that it forces better task decomposition. If every step goes to the same model, it is easy to blur planning, execution, debugging, and review into one long agent session. That can work, but it also burns tokens and hides where the real difficulty is.

Once you start routing tasks, you naturally split the work more clearly. You ask what needs judgment and what needs execution. You think before launching a long agent run. You become more aware of context size, retries, and validation loops. You start testing other models on low-risk work instead of assuming the most expensive model is the only serious option.

This seems to be emerging as a practical developer habit right now. Not as a manifesto, not as a formal methodology, but as something people discover after spending enough time with different models. The more you use them, the less they feel interchangeable. They have strengths, weaknesses, habits, costs, and rough edges, and good developers are starting to take those differences seriously.

This is probably something we should have been doing anyway. The cost simply makes it visible.

Models worth testing

This is also why alternatives become more interesting. DeepSeek, MiniMax M2.7, cheaper code-focused models, smaller hosted models, local models through tools like Ollama, and routing platforms like OpenRouter all become worth testing, not because they replace frontier models everywhere, but because they may be good enough for many parts of the workflow.

The right way to test them is not with abstract leaderboard worship. It is to try them on real developer tasks: write these tests, apply this refactor, summarize this module, fix this simple bug, update this documentation, review this small diff. Some will fail. Some will be annoying. Some will surprise you.

The goal is not to build a theoretical ranking of models. The goal is to learn which model you trust for which kind of work.

The new reflex

The developer-router is not a grand theory. It is a small habit that becomes more useful as model prices spread out.

Use the expensive model when the task is hard, uncertain, or risky. Use cheaper near-frontier models when the task is clear, repetitive, or already planned. Bring the expensive model back when the work requires judgment again.

I am not going to stop using frontier models. I like them too much, and they are too useful when the problem is genuinely hard. But I am going to stop using them for everything.

The new reflex is simple: the right model, at the right time, for the right task.

Table of Contents