Many companies want to understand power utilization to feel good about the AI they will come to rely on. To help organizations, the time has come for the data center industry to agree upon using a new metric, one that is both intuitive and informative. This metric should bridge the worlds of power use and AI compute work output.
This is why I think tokens per watt should come to the forefront.
Usefulness of tokens per watt
You may ask, “What are tokens?” Simply put, tokens are the language that AI models speak. Text, images, audio clips and videos are broken into logical and descriptive pieces that all AI models can process.
Tokens are essential to understand for one important reason: They are how people pay for AI working models, also known as reasoning or inference models. In working AI models, tokens are used for input queries as well as output intelligence of prediction, content generation and reasoning. Users can pay based on token use. For ChatGPT’s GPT-5, text input queries are $1.25 per 1 million tokens. Output responses are $10 per 1 million tokens. Image and audio token prices are higher.
Tokens per watt is, therefore, an extremely useful metric showing how much “work” an IT system can produce for every watt of power consumed.
At the micro level, companies can use the tokens per watt metric to grade IT performance and as an ROI to justify purchases. As GPUs evolve, there is usually an order of magnitude increase in performance, but also an increase in power usage. This juncture is where tokens per watt come into play, as the computing performance work output increase is typically much higher than the electric power use increase.
From a macro level, the tendency is to look at the gross power increase from data centers that will power AI. Tokens per watt can put a different lens on the topic by showing that, while power use is increasing, the work output is advancing at a much greater speed.
For years, IT performance advanced at a relatively slow pace compared to today’s advances. Now, it’s “accelerating compute” versus power use, and tokens per watt can quantify compute efficiency going forward.
The need for metrics measuring computing output
While society’s demand for increased automation through accelerated compute and AI is driving electricity demand, what’s missing in the discussion is the “computing bang for the power used.” You may argue that we already have metrics. That’s true, but they’re lacking in measuring computing output.
For decades, Moore’s Law was touted as a metric to gauge computing progress. Published in 1965 by Intel cofounder Gordon Moore, the law essentially predicted that the number of transistors in a dense integrated circuit would double every two years, increasing processing power by 1.5 times or two times for the same power used. However, the law focused on central processing units (CPUs), not the GPUs used for AI today. It is also becoming less relevant due to, among other reasons, the speed of light imposing a natural limitation on the number of computations a single transistor can process.
Another commonly used metric, power utilization effectiveness (PUE), is a ratio that divides the power coming into your data center by the power used directly by the IT, with a perfect rating of 1.0. But people are used to efficiency as a percentage, so the ratio is often confusing. PUE also doesn’t require real-time, continuous data monitoring, usually only one data point per week or month, allowing the operator to choose the most beneficial reporting time or turn up the IT power use at the measurement time.
Floating-point operations per second (FLOPS) per watt is a metric used in high-performance computing (HPC), but it can be misleading. Peak FLOPS are the standard metric. Tasks relying on FLOPS, like certain machine learning models, may have high FLOPS per watt. However, applications with less intensive floating point demands or tasks limited by other factors (low memory bandwidth or high latency) diminish their significance as a measure.
Targeting improvements in facilities
Data center operators have been measuring the power used—kilowatt-hour (kWh)—for a couple of decades for PUE reporting, carbon emission calculations and to target efficiency improvements. For kWh and other metrics like token per watt, operators need accurate power measurements. How and where power is measured depends on the type of facility and the IT workload involved:
- For stand-alone facilities, get the power usage from the main central breakers data or the utility meter.
- In mixed-use facilities—where the data center occupies only part of the facility—power usage can typically be measured at the uninterruptible power supply (UPS) connected to the main power feed or breaker serving that room. However, because cooling systems like chillers are often shared across the entire building, you’ll need to allocate a portion of the cooling energy specifically to the data center’s IT equipment.
- For mixed, accelerated compute AI, as well as for general-purpose IT or non-accelerated cloud IT, power use data needs to be taken at the individual server outlets from the power distribution units (PDUs).
- For a rack of accelerated compute AI, branch feed meters in the busway provide the power data.
- For an AI cluster, power is often delivered through large feeder breakers connected to a central PDU, switchgear or a dedicated UPS. These components can monitor and report the power consumption.
In all cases, total power usage should include both the electricity consumed by the hardware and the additional power required for cooling. To calculate tokens per watt, divide the number of tokens generated in one hour by the total power used during that same hour.
It’s time for tokens per watt
As AI becomes increasingly central to business operations, leaders should begin evaluating infrastructure not just by power cost or capacity, but by how much useful AI output—tokens—is being produced per watt. That’s the future of responsible, performance-driven AI deployment.
By providing this insight, the data center industry can demonstrate that AI is implemented responsibly.
This article was previously published in Forbes.
Add a comment