Compute as a Strategy: How the OpenAI–Cerebras Deal has the potential to change Tokenomics

#58 Memory Matters

organicintelligence

2/1/202610 min read

Tech giants now battle over AI compute strategy as massive investments alter the map of the industry. Data centers will require $6.7 trillion worldwide by 2030 to keep up with compute power demands . Companies must transform their approach to artificial intelligence development due to this enormous capital requirement.

OpenAI's recent $10 billion deal with chipmaker Cerebras provides the clearest evidence of this transformation . This strategic collaboration goes beyond a typical vendor relationship and revolutionizes AI compute economics. Organizations now create sophisticated AI infrastructure strategies with specialized hardware for large-scale AI compute workloads instead of relying on traditional GPU infrastructure. OpenAI's commitment extends to developing 30 gigawatts of computing resources - a $1.4 trillion endeavor .

Lets grab some coffee and break down why OpenAI's Cerebras partnership marks a strategic turning point for the AI industry. Wafer-scale chips can potentially responses up to 15 times faster than GPU-based systems, which creates new possibilities for companies developing AI initiatives. These developments compel organizations to view compute not just as a technical requirement but as a core strategic advantage.

The shift from experimentation to AI at scale

The AI industry has reached a turning point as companies move from testing to implementing artificial intelligence at scale. This change completely transforms compute requirements, cost structures, and infrastructure strategies in every part of the ecosystem.

AI workloads are different from those with traditional compute

AI workloads just need a completely different approach to computing infrastructure compared to conventional applications. Traditional data centers mainly use CPU-based processing for sequential tasks. However, AI technologies need massive parallel processing capabilities of specialized hardware like GPUs and TPUs [1]. These differences show up in several important ways:

Power consumption: AI WL use extraordinary amounts of power. Machine learning deployments will just need more than 500KW per IT rack in the next 5 years. Traditional racks only use 5-30KW [1].
Cooling requirements: AI processors generate intense heat that needs advanced cooling solutions. About 40% of data center energy goes toward cooling efforts.
Networking infrastructure: Data-hungry AI models need networking capabilities nowhere near the 10-20 gigabits per second common in traditional environments.
Hardware specialization: AI data centers use purpose-built accelerators. These optimize processing of large unstructured data volumes and support complex tasks like natural language processing [2].

The rise of inference as a dominant cost driver

Training makes the headlines, however inference—running trained models in real-life applications—has become the decisive economic factor in AI compute strategy. Training happens periodically, but inference runs continuously when users interact with an AI system [3].

Inference represents the most resource-intensive and expensive part of AI applications, accounting for up to 90% of a model's total lifetime cost in some cases [4]. This change transforms infrastructure planning:

Inference will surpass training by 2030. It will become the main workload in AI data centers and represent more than half of all AI compute and roughly 30-40% of total data center demand [1].
Each query adds compute costs over time. Every prompt creates tokens that cost specific computational expenses [4].
Hardware costs have dropped by about 30% yearly. Energy efficiency has improved by 40% each year. Yet faster increasing usage often offsets these gains.

How OpenAI's growth reflects this shift

OpenAI's story shows how compute availability directly enables AI scaling. The company's ability to serve customers—measured by revenue—has matched almost perfectly with available compute resources:

Compute grew 3X year over year: from 0.2 GW in 2023 to 0.6 GW in 2024, then to approximately 1.9 GW in 2025 [5].
Revenue followed the same pattern, growing 3X year over year: from $2 billion ARR in 2023 to $6 billion in 2024, then exceeding $20 billion in 2025 [5].
Users have flocked to the platform. ChatGPT now reaches over 500 million users and processes 2.5 billion messages daily, including 330 million per day in the US [5].

OpenAI's CEO stated, "This is never-before-seen growth at such scale. And we firmly believe that more compute in these periods would have led to faster customer adoption and monetization" [5]. Go get your Money my Memory Friends! This reality pushed the organization to vary beyond a single compute provider to multiple sources. They created a strategic compute portfolio that balances capability and efficiency [5].

One thing becomes clear: compute strategy determines who succeeds in the AI economy. This happens as AI moves from experimental projects to production systems handling millions of inference requests daily.

The economics of AI compute: A new reality

The financial reality of AI compute has changed as systems move from test projects to real-world use. Companies now face tricky money decisions that go beyond the usual ways of planning infrastructure.

Understanding inference economics

The cost landscape for AI is changing dramatically. The price of inference—running a trained model to answer queries—has plummeted by approximately 280-fold between November 2022 and October 2024 [6]. In spite of that, AI spending keeps rising sharply as usage explodes. This apparent contradiction comes from the basic economics of AI consumption: total costs increase when usage grows faster than efficiency gains, even with falling per-unit prices.

Tokens are the currency of AI economics—these units of compute are needed to process user queries and generate responses. Each query adds up costs with every token it processes [6]. Companies face a challenging reality today: while the cost per million tokens dropped from $20 in late 2022 to around $0.40 in August 2025 [7], token usage per query shot up from 220 tokens in 2021 to about 22,000 tokens in a single exchange by 2025 [7].

Are API-based models cost-prohibitive at scale?

API-based models are a great way to get started with AI experiments, but they don't make financial sense for large-scale enterprise use. Some organizations' monthly AI bills now reach tens of millions of dollars [5]. The problem gets worse with agentic AI because it needs continuous inference that sends token costs through the roof [5].

Money problems become clear when you look at subscription economics. Heavy users, called "Inference Whales," might use up to $35,000 in monthly tokens while paying just $200 per month for subscription access [1]. AI providers had to add token limits to avoid losing money [1].

The economics get worse as models advance:

Newer models cost way more to train
Users want better models without paying more
Better model capabilities usually mean higher per-query costs
API providers often lose money on inference to attract users, which isn't sustainable [1]

The tipping point for cloud vs. on-premises

Companies reach a financial turning point that makes them think over their infrastructure choices. This happens when cloud costs hit 60-70% of what an equivalent on-premises system would cost [2]. At this point, buying your own hardware makes more sense than paying ongoing operational costs for predictable AI workloads.

Companies running AI operations around the clock usually hit the breakeven point between cloud and on-premises infrastructure within 12-18 months [8]. IT teams can spot this transition by tracking GPU use, inference costs, idle time, and data-egress fees [2].

Beyond just money, other factors speed up the move toward owned infrastructure. Up-to-the-minute applications that need responses in under 10 milliseconds can't work with cloud delays [5]. Rules and regulations about data sovereignty are pushing many companies—especially outside the United States—to bring computing services back in-house [5]. Critical systems that must run without interruption often need on-premises deployment as their main setup or backup [5].

Companies must create smart strategies that balance cloud flexibility with owned infrastructure economics to build lasting AI capabilities as the cost landscape evolves.

Why the OpenAI–Cerebras deal is a strategic inflection point

January 2026 brought a major change in AI compute strategy when OpenAI announced a groundbreaking deal with Cerebras Systems worth over $10 billion [9]. This goes beyond a typical vendor relationship and shows how AI infrastructure will be built differently from now on.

What makes Cerebras different from Nvidia and AMD

Cerebras has developed a completely different approach to AI chip architecture through its Wafer Scale Engine. The company builds massive processors on entire 300mm silicon wafers [10], unlike the small, postage stamp-sized chips. The latest WSE-3 packs 4 trillion transistors and 900,000 AI-optimized cores on a single wafer [4]. Nvidia H100 has 80 billion transistors in comparison [3].

The architecture attempts to solve the memory bottleneck that affects traditional GPU setups. Cerebras adds 44GB of SRAM directly on-chip [4] potentially removing the need for external memory (?) and fixes the main limitation of GPU architectures where data must constantly move between separate compute and memory chips.

How wafer-scale chips change inference performance

Cerebras systems have potential to deliver responses up to 20 times faster than similar Nvidia solutions [11]. The system achieves 1,800 tokens per second on specific models like Llama3.1 8B, setting new speed measures [4].

This speed comes from removing chip-to-chip communication delays. GPU clusters need extensive data movement between separate chips, which creates bottlenecks. Research at the University of Edinburgh shows these architectural differences help wafer-scale chips perform ten times better in latency compared to clusters of 16 GPUs [12].

$10B deal - What is it?

The agreement lets OpenAI buy up to 750 megawatts of computing power over three years [9]. The capacity will roll out in multiple phases through 2028 [13]. Cerebras will build or lease data centers with its chips, while OpenAI pays to use these services for inference workloads [9].

This deal reveals new trends in AI infrastructure planning. Specialized AI hardware optimized for specific workloads becomes vital at scale. OpenAI's CEO Sachin Katti explained their approach: "our compute strategy is to build a resilient portfolio that matches the right systems to the right workloads" [13]. The company now looks at multi-vendor compute strategies.

The deal highlights a basic truth about AI economics - speed drives productivity. User engagement grows dramatically as model responses become immediate [14]. This opens up new uses and applications that wouldn't work well with slower systems.

The rise of hybrid and AI-optimized infrastructure

The economics of AI compute presents a bigger problem: building physical systems that can support non-stop AI operations. Companies adopting AI at scale show an interesting trend - all but one of these organizations (98%) now use hybrid infrastructure models to balance different compute approaches [15].

Three-tier compute strategy: cloud, on-prem, edge

Leading companies no longer think in terms of cloud-versus-on-premises. They now use a strategic three-tier architecture that matches specific workload needs [5]:

Cloud for elasticity: This handles variable training workloads, experimentation phases, and burst capacity needs. Public cloud gives you access to advanced AI services and makes it easier to manage fast-changing model architectures [5].
On-premises for consistency: You can run production inference at predictable costs for high-volume, continuous workloads. This way you retain control over performance, security, and cost management.
Edge for immediacy: Time-critical decisions need minimal latency. This becomes most important in manufacturing and autonomous systems where milliseconds can determine success.

Companies find this hybrid approach attractive when cloud costs reach 60-70% of on-premises alternatives [16].

The role of AI factories and custom-built data centers

AI factories are changing how we design data centers. These specialized compute plants built for AI processing blend several key components [5]:

AI-specific processors with high-bandwidth memory
Advanced data pipelines optimized for AI model consumption
High-performance networking with minimal data-transfer latency
Pre-optimized algorithm libraries that arrange with business goals

AI-optimized facilities will represent 28% of the global data center market by 2027 [17]. There's another reason to pay attention - 47% of industry experts believe AI-focused data centers will handle more than half of all workloads within two years [17].

Managing complexity across heterogeneous platforms

AI workloads now rely on heterogeneous computing - spreading AI tasks across different processor types like CPUs, GPUs, NPUs, and specialized accelerators [18]. This mix helps compute systems match workloads with the best processor to streamline processes, power efficiency, and cost.

This complexity creates substantial challenges. Organizations must balance managing thousands of different services across multiple platforms while keeping operations reliable [5].

The importance of orchestration and unified management

Organizations need unified management approaches to handle this complexity and hide platform differences [5]. AI orchestration makes shared AI models, systems, and integrations essential to streamline the end-to-end AI lifecycle [19].

Good orchestration platforms automate AI workflows, track progress, manage resources, monitor data flow, and handle failures. This all-encompassing approach helps organizations scale better, work faster, collaborate better, perform better, and govern more reliably.

The result? A more resilient infrastructure that adapts to fast-changing AI demands without major reengineering as workloads evolve [18].

How organizations can prepare for the next wave of AI scale

Companies that implement AI at scale have grown from 24% to 39% in the past year [21]. Organizations should plan strategically for the next wave by:

Creating "AI factories" that combine technology platforms, methods, data, and algorithms to speed up AI development [21]
Building unified management platforms that control identity, permissions, and security across different systems [22]
Making cybersecurity a priority - 80% of leaders see it as their biggest obstacle to AI strategy [22]

Organizations must find the right balance between sustainability, performance, and scalability as AI becomes central to enterprise strategy.

Closure Report

The reported OpenAI–Cerebras tie-up is a wake-up call for AI infrastructure: compute isn’t a commodity anymore—it’s strategy. Even as per-token inference gets cheaper, total spend keeps climbing because usage scales faster than efficiency gains. That’s why many teams are rethinking “all-cloud” setups: for steady, predictable workloads, owning capacity can pencil out better than paying a cloud premium.

Cerebras’ wafer-scale approach changes the performance math by reducing the memory bottlenecks common in GPU clusters, enabling much faster responses—and that speed can unlock workflows that used to feel too slow to be practical. In reality, most organizations are landing on hybrid, three-layer designs: cloud for flexibility, on-prem for consistent throughput, and edge for low-latency needs—especially as more AI-optimized data centers come online (some forecasts put this as a large and growing slice of the market by the late 2020s).

Looking ahead, energy and sustainability become part of the architecture conversation, not an afterthought, as AI pushes data-center power demand higher. The next winners treat compute like a core advantage—balancing cost, speed, and efficiency with infrastructure built to adapt.

Key Takeaways

The OpenAI-Cerebras deal represents a fundamental shift in AI economics, signaling that specialized compute infrastructure has become the determining factor for scaling AI profitably.

• Inference costs now dominate AI economics: While per-token costs dropped 280-fold since 2022, inference represents up to 90% of total AI lifetime costs due to explosive usage growth.

• Cloud becomes cost-prohibitive at scale: Organizations hit a financial tipping point when cloud costs reach 60-70% of on-premises alternatives, typically within 12-18 months of continuous AI operations.

• Specialized hardware delivers transformational performance: Cerebras wafer-scale chips achieve 20x faster responses than traditional GPUs by eliminating memory bottlenecks that plague conventional architectures.

• Three-tier hybrid strategy is essential: Leading organizations balance cloud elasticity, on-premises consistency, and edge immediacy rather than relying on single infrastructure approaches.

• Speed directly drives AI productivity: As model responses approach real-time, user engagement increases dramatically, enabling entirely new applications previously impossible with higher latency.

The future belongs to organizations that view compute not as a technical requirement but as core strategic advantage, building resilient infrastructure capable of adapting to rapidly evolving AI demands while managing sustainability concerns.

References

[1] - https://research.contrary.com/report/the-economics-of-ai-build-out
[2] - https://www.itprotoday.com/cloud-computing/ai-infrastructure-inflection-point-60-cloud-costs-signal-time-to-go-private
[3] - https://www.jarsy.com/blog/cerebras-vs-nvidia
[4] - https://www.nasdaq.com/articles/new-ai-chip-beats-nvidia-amd-and-intel-mile-20x-faster-speeds-and-over-4-trillion
[5] - https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/ai-infrastructure-compute-strategy.html
[6] - https://www.quali.com/blog/agentic-ai-infrastructure-management/
[7] - https://www.ineteconomics.org/perspectives/blog/the-u-s-is-betting-the-economy-on-scaling-ai-where-is-the-intelligence-when-one-needs-it
[8] - https://www.infracloud.io/blogs/on-premise-ai-vs-cloud-ai/
[9] - https://www.reuters.com/technology/openai-buy-compute-capacity-startup-cerebras-around-10-billion-wsj-reports-2026-01-14/
[10] - https://news.ucr.edu/articles/2025/06/16/wafer-scale-accelerators-could-redefine-ai
[11] - https://www.agriwaypartners.com/news/story/29249701/new-ai-chip-leaves-nvidia-amd-and-intel-in-the-dust-with-20x-faster-speeds-and-over-4-trillion-transistors
[12] - https://www.ed.ac.uk/news/chip-and-software-breakthrough-makes-ai-ten-times-faster-0
[13] - https://openai.com/index/cerebras-partnership/
[14] - https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream
[15] - https://www.coresite.com/blog/new-state-of-the-data-center-report-highlights-hybrid-infrastructure-in-the-ai-era
[16] - https://www.deloitte.com/us/en/insights/topics/emerging-technologies/ai-infrastructure-hybrid-cloud-cost-optimization.html
[17] - https://www.datacenterknowledge.com/ai-data-centers/how-ai-data-centers-redefined-the-industry-in-2025
[18] - https://newsroom.arm.com/blog/unlock-the-future-of-ai-with-heterogeneous-computing
[19] - https://www.ibm.com/think/topics/ai-orchestration
[20] - https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117
[21] - https://sloanreview.mit.edu/article/five-trends-in-ai-and-data-science-for-2026/
[22] - https://kpmg.com/us/en/media/news/q4-ai-pulse.html

Linked to ObjectiveMind.ai