What if the tokens get too expensive? Open AI models as a strategic choice
Chris Lukassen
—
AI Insights

Written by
This article was originally written in Dutch.
As long as you're small, the AI bill feels negligible. A few tens, a few hundred euros a month: noise on the budget. But as a product manager, your job isn't to calculate the pilot. Your job is to calculate the scenario where the product succeeds. And that's exactly where it gets interesting: what happens to your product the moment it really takes off and the tokens get too expensive?
Three forces acting on your product once AI is in it
For most products, AI starts the same way: you call an API, you pay per token, your data goes to the provider and comes back with an answer. Comfortable, fast to build, nothing to look after. You're renting intelligence. But once that intelligence becomes part of your product rather than an experiment beside it, three forces start acting on you that weren't there before.
The tokens get more expensive. Not necessarily per token, but in total. Cost per token feels negligible in a prototype and becomes your biggest variable cost the moment you succeed. The sting is in the scale: the better your product works, the more it gets used, the faster the bill grows. And you have zero control over pricing. You pay what the vendor asks, when they ask it.
Lock-in. With a closed API, you're tied to the prices, the roadmap and the availability of a single party. A model that forms the core of your user experience today can be deprecated tomorrow, doubled in price, or quietly updated so that your carefully tested output suddenly behaves differently. The deeper AI sits in your product, the bigger the dependency and the more expensive it becomes to ever get out.
Sovereignty. For many Dutch and European organizations (banks, healthcare, government), this isn't a detail but the main question. Where is your data allowed to go? Who trains on it? Which jurisdiction does it fall under? GDPR, sector rules and data sovereignty often determine which option is even permitted, before you've looked at cost or quality.
These three forces aren't separate. They all point in the same direction: the more serious AI becomes inside your product, the more grip you want on something you're currently renting.
Running an open-weight model yourself: what does control really mean?
There is an alternative, and it has matured. Instead of calling an API, you download an open-weight model (Google's Gemma is the best-known example) and run it on your own hardware or on infrastructure you rent directly. You never call a third-party API. The marginal cost per token drops to the price of power and hardware: at scale, sometimes ten to a hundred times cheaper. You fine-tune on your own data, you decide when to upgrade, and if the vendor disappears tomorrow, your product keeps running.
On paper, that solves all three forces at once. Tokens? No more token bill. Lock-in? You own the model file. Sovereignty? Your data never leaves your environment. Sounds too good. And that's exactly when you, as a product manager, should dig deeper: what does "control" actually mean here?
Start with an uncomfortable question: why does Google give away a model that cost hundreds of millions to build, completely free? Not out of charity. Google doesn't make money on the model; Google makes money on the infrastructure underneath it. The model is free, but the moment you want to do anything serious with it (fine-tune on your own data, serve it to thousands of users, build an agent on top), you need infrastructure. And Gemma is neatly embedded in Google's own cloud, on Google's own chips, with Google's own deployment tooling. The free model is the funnel; the cloud contracts are the business.
That's where the catch is. "Free" and "open" don't automatically mean "in control". If you run Gemma on the cloud and the tooling of the party that released the model, you may have traded your token bill for an infrastructure bill with that same party, and your lock-in has changed shape rather than disappeared. Real control means owning the rails too, or at least being free to choose where they run.
And control has its own price. Running a model yourself isn't a checkbox but a competence. You need GPUs, or you rent them. You need people who serve, monitor, update and maintain the model. You trade a predictable monthly bill for operational complexity and a team that can handle it. For a company spending 2 million a month on API calls, that equation flips easily. For a product burning a few hundred euros a month, almost never. Then convenience is worth more than control.
The open model landscape: from DeepSeek to Mistral
"Open" isn't a switch but a spectrum, and the differences between options are exactly the differences that matter for a product.
The Chinese labs (DeepSeek, Alibaba's Qwen and others) have released open models over the past year that approach the closed frontier on many tasks, often under very permissive terms (DeepSeek's R1 is even MIT-licensed) and dirt cheap to run. On price and performance alone, that's tempting. But here your sovereignty question comes back with full force. Even if you run the model yourself and your data never leaves your environment, the question remains whether a Chinese model is acceptable politically, contractually and in terms of supply chain for your customers and your regulator. For a consumer app, maybe not an issue (Airbnb runs on Qwen); for a bank, hospital or government service, often a conversation with compliance before a single line of code is written.
Mistral, the European alternative, flips that trade-off. For an organization that cares about data sovereignty, a European model has a head start. But mind the license: Mistral's smaller models are permissive and free to use commercially, while the heavier models require a commercial license for production use. "Open weights" doesn't automatically mean "free to use in your product". You may be allowed to download and study the model, but shipping it inside your product is a separate, and paid, agreement. They're coming to speak about this at our next AI Summit, which we're very excited about!
This isn't a new concern, but history repeats itself. The license isn't a legal afterthought you have someone check at the end. It's a product decision. It determines whether you can use the model commercially, whether you can modify it, whether you can run it on your own infrastructure, and what bill lands on your desk in two years when your product is big. Treat the license as a feature with costs and risks, not as fine print.
Small specialized models: why bigger isn't better
Gemma and the compact variants of the Chinese models are smaller than the absolute frontier: the closed giants from OpenAI, Anthropic and Google's own Gemini. That sounds like a shortcoming. It isn't.
Because what are you actually paying for with such a massive, generalist model? Unimaginable breadth. It knows Shakespeare, Python, the Roman Empire, knitting patterns and just about every Lego set ever released. Impressive, and for most products completely unnecessary. You're paying compute, money and latency for knowledge your product will never touch.
Say your product runs in an industrial environment and needs to detect defects in weld seams. What use is a model that also knows knitting patterns and every Lego set by heart? None. You don't need breadth; you need depth in exactly one thing. A smaller model, specialized and fine-tuned on your domain, effortlessly beats the lumbering generalist on that single task. It's faster, it's a fraction of the cost, it fits on modest hardware, and it can run at the edge of your network, right next to the welding robot, instead of in a data center on the other side of the world.
And that, I believe, is where the future lies. Not with ever-bigger generalists, but with specialist models that do one task exceptionally well. The closed frontier remains valuable for exploring, prototyping and tasks that genuinely require broad reasoning. But in production, where cost, speed, control and sovereignty count, the model that knows exactly enough and not a byte more wins. And a small specialized model you own yourself defuses all three forces from the start: it cuts the token bill, it frees you from lock-in, and it keeps your data home.
Seven questions for the right AI model decision
This is where the real work is. Use these to force the right discussion with your team and your stakeholders before AI is baked into your product and the choice has already been made for you implicitly.
1. At what scale will we be running, and what will that cost? Don't calculate the pilot; calculate the success scenario. How many requests per day at ten thousand users? At a hundred thousand? Put that scenario's token bill next to the cost of running it yourself: hardware, infrastructure and the people who manage it. The point where those two lines cross isn't a technical detail; it's your break-even, and it determines whether "renting" is a temporary phase or your final destination.
2. How much dependency do we accept, and did we choose it deliberately? Lock-in isn't a problem when it's a choice. It becomes a problem when it's a surprise after the fact. Ask the question out loud: what happens to our product if this vendor doubles its price tomorrow, deprecates a model, or goes under? And don't forget the hidden variant: even a free open model can lead you back into dependency through the infrastructure or the license underneath it.
3. Where is our data allowed to go and what does our sector demand? Start here if you're in a regulated environment, because this question can overrule the others. Sometimes the cheapest or best option simply isn't permitted, or a Chinese model won't get past your compliance department no matter how strong it is. Map out which options you're even allowed to use before optimizing for price or quality. Sovereignty isn't a feature you add later; it's a constraint that shapes your architecture.
4. Are we allowed to use this model in production, and at what price? Read the license as a product decision. Can you use it commercially? Can you modify it and host it yourself? Is it permissive (like Apache 2.0 or MIT), or do you owe a commercial license for production use, as with Mistral's heavier models? "Open" and "free to use" are not the same thing, and that difference can land on your budget two years from now.
5. Do we have the competence to carry control? Ownership without capability isn't control; it's a new risk. Can we serve, monitor, fine-tune and update a model? And who does that when that person leaves tomorrow? Be honest about what your team can handle. It's perfectly legitimate to deliberately choose the convenience of an API because you want to spend your energy elsewhere. What isn't legitimate is believing you're in control while underestimating the operational burden.
6. What's the smallest model that does our task exceptionally well? Flip the habit. Instead of starting with the most powerful model and wondering whether it's affordable, start with your task and find the smallest, most specialized model that can handle it. A fine-tuned compact model that does one thing exceptionally well is almost always cheaper, faster and easier to manage in production than a generalist that can also do a thousand things you'll never use.
7. Is this model our differentiation or a utility? The question that colors all the others. If AI is a feature (a smart search, a summary, a suggestion), you're buying a commodity and your advantage lies elsewhere; you want to spend as little attention on it as possible. But if the model is the core of your proposition, you're buying your own competitive edge from a party that will sell it to your competitor tomorrow at the same price. Differentiation is something you want to own and control. A utility is something you want to rent cheaply and without hassle.
The bottom line
The question "what if the tokens get too expensive" is essential, because it forces you to look ahead to the moment of success rather than at today's comfortable pilot. There's no universally right answer: renting is the sensible choice for one product and a ticking cost bomb for another. But the answer never starts with which model is the biggest or most powerful.
It starts with what AI actually does in your product, which problem it solves, and how narrow that task really is. Do you treat it as a utility you rent, or as the core you own and control? Do you need the breadth of a generalist, or the depth of a specialist?
The winning products of the coming years won't run on the biggest model, but on the smallest model that does their one task exceptionally well: owned, controlled, and exactly enough.



