Big Tech Is Quietly Admitting That If It Wants to Sell People on AI, It Better Be Cheap



The AI mania that’s spread across Silicon Valley like a fever over the past few years is running up against some hard economic realities.

In recent weeks, big tech companies have been forced to admit that spending on tokens—the basic unit of measurement for AI usage—has gotten out of control. Amazon had to shut down an in-house competition to use as many tokens as possible at work, telling employees, “Please don’t use AI just for the sake of using AI,” according to Business Insider; Uber has reportedly capped employee spending on tokens to $1,500 per month after the company exhausted its annual AI budget earlier this year. And most tellingly, the companies building the big AI models have also woken up to this sobering reality. At a recent event hosted by OpenAI, company chief executive Sam Altman admitted that token usage had become “a huge issue” for companies that were promised big productivity gains if they incorporated AI across their organization.

That’s a hard pivot from just a few months ago, where the general vibe across the industry was the more that employees use AI, the better off they—and the companies they work for—will be. So-called “tokenmaxxing” became a meme, and more or less synonymous with “future-proofing”: in a day and age when everyone and their neighbors are using AI, those who know how to use AI will have a sharp edge. Not every job will necessarily be replaced by AI (so the thinking goes), but employees who don’t use AI will definitely be replaced by those who do.

But AI has always been expensive, and training and inference costs for new models are only getting higher. Meanwhile, the industry’s dedicated push into agents—AI systems that can work with little to no human oversight for extended periods of time—has led to a token usage explosion. One preprint study posted in April found that agents use 1,000 times as many tokens as other AI systems.

It’s the companies and individual users who have overwhelmingly had to eat those costs. No wonder some developers have resorted to pirating free online chatbots like Chipotle’s customer service bot, Pepper, to bypass the big companies’ token-hungry models. GitHub announced earlier this week that it was rolling out a new payment model in which users would be charged by the number of tokens they burn. Judging from some of the early user feedbackit hasn’t been going well.

Big tech desperately needs to find a new way to sell people on the future of AI without the exorbitant token costs. If they don’t, companies and users will just switch to some open model they can use for free.

Close to the edge

Some big tech companies have literally been forced to the edge by the rising costs of AI usage.

Microsoft and Google recently announced new AI products—Gemma 4 12B and the RTX Spark laptop, respectively—that are based on so-called “edge” computing. That’s when a model is powered by computing power from a specific device, rather than by the cloud (i.e., energy-guzzling data centers). Obviously, a model of the magnitude of a Claude Opus 4.8 or a GPT-5 isn’t going to be able to be run directly from your laptop; that’s like trying to provide enough energy for a Falcon 9 rocket launch by plugging a stationary bike into a generator. But the logic behind Microsoft and Google’s new products is that actually, not everyone needs the latest, greatest, most token-hungry models directly in the devices they’re using daily. For most people most of the time, a smaller, leaner model will work just fine. And crucially, it will save everyone some money on tokens.

Make no mistake, Microsoft’s and Google’s investments in edge computing are minuscule compared to what they’re spending on data centers; cloud computing is still very much the backbone of both of their business models. But in their embrace of edge computing, we’re seeing an, at least tacit, acknowledgement that the cost of massive AI models just isn’t worth the squeeze that it’s placing on most consumers.

Water promises

While they’re pushing new edge computing products—promising powerful AI capabilities at a lower cost—Microsoft and Google are also trying to pacify a public that’s become increasingly concerned over data centers’ water demands. (Data centers usually use water to keep GPU clusters from overheating.) On Tuesday, during the opening keynote of Microsoft Build, the company’s annual developer conference, CEO Satya Nadella claimed Microsoft’s new data centers’ annual water usage “is roughly equivalent to what a single restaurant would use.”

The following day, Google announced plans to “replenish more water than we consume” from data center cooling by 2030, along with other “water stewardship commitments.” For a little extra sprinkle of intended comfort, the press release noted that “U.S. data centers use less than 1% of the water that Americans use on their lawns annually”—though that’s probably more of a damning picture of Americans’ lawn-watering habits than it is an absolution of the water-guzzling sins of the AI industry.



Source link

You may be interested

Leave a Reply

Your email address will not be published. Required fields are marked *