understanding /pmo

What's Actually Inside a Hyperscale Data Centre? The AI Chips Powering Claude, ChatGPT, and Gemini

When you ask Claude a question, request an image from Gemini, or hold a conversation with ChatGPT, the response arrives in seconds. Behind that speed is a chain of infrastructure that stretches from a semiconductor fabrication plant in Taiwan to a liquid-cooled rack in a data centre that consumes more electricity than a small town. The chip at the centre of that chain — the GPU — is the most consequential piece of hardware in the global economy right now. And most people have never heard of it.

This post breaks it down in plain English: what these chips are, how they're made, what they cost, and why delivering the buildings that house them has become one of the most complex programme management challenges on the planet.

What Is a GPU and Why Does AI Need Thousands of Them?

A GPU — graphics processing unit — was originally designed to render images in video games. What made it useful for gaming also makes it indispensable for AI: the ability to perform thousands of mathematical calculations simultaneously.

Training an AI model like ChatGPT or Claude requires processing trillions of data points. A conventional CPU handles tasks sequentially — one after another. A GPU handles them in parallel — thousands at once. Think of a CPU as a single brilliant mathematician solving equations one at a time. A GPU is a stadium full of average mathematicians, each solving a different equation simultaneously. For AI workloads, the stadium wins every time.

The current state of the art is NVIDIA's Blackwell Ultra architecture. A single B300 GPU delivers 15 petaflops of AI compute, contains 288 GB of high-bandwidth memory, and draws 1,400 watts of power — roughly the same as running fourteen 100-watt lightbulbs continuously. That power figure has doubled in just two generations: the H100 (2023) drew 700 watts, the B200 (2024) drew 1,000 watts, and the B300 (2025) pushes to 1,400 watts.

These chips do not work alone. NVIDIA packages 72 GPUs together with 36 Grace CPUs into a single liquid-cooled rack called the GB300 NVL72. That rack delivers 1.1 exaflops of compute — equivalent to a supercomputer that would have ranked among the world's most powerful just five years ago. It operates as a single accelerator, with every GPU able to access a shared pool of over 20 terabytes of memory.

How Are These Chips Made and What Do They Cost?

NVIDIA designs the chips. It does not manufacture them. Fabrication is performed by TSMC (Taiwan Semiconductor Manufacturing Company), the world's dominant contract chipmaker. The process is extraordinary in its precision and complexity.

A silicon wafer — a disc roughly 30 centimetres in diameter — goes through hundreds of steps: layering, patterning, etching, and dicing. The circuitry is printed using extreme ultraviolet lithography at the 4-nanometre scale. For context, a human hair is approximately 80,000 nanometres wide. The computational lithography step alone can consume 30 million CPU hours per mask set. NVIDIA's own cuLitho platform uses AI to accelerate this process, replacing 40,000 CPU systems with 350 GPU systems.

Blackwell chips contain two separate compute dies and eight stacks of high-bandwidth memory, all connected using TSMC's advanced CoWoS (Chip-on-Wafer-on-Substrate) packaging. After fabrication in Taiwan (and increasingly in TSMC's new Arizona facility), chips are packaged and assembled before being integrated into complete server systems by partners like Dell, HPE, and Supermicro.

The cost reflects the complexity. A single B200 GPU runs between $45,000 and $50,000. A complete DGX B300 system with eight GPUs exceeds $500,000. A full GB300 NVL72 rack with 72 GPUs is estimated between $3.7 million and $4 million. An entire DGX SuperPOD — 576 GPUs across eight racks — represents a multi-tens-of-millions investment before you account for networking, cooling, or the building around it.

How Many Chips Go Into a Hyperscale Data Centre?

This is where the numbers become staggering — and where the infrastructure challenge crystallises.

Microsoft's AI data centre in Mount Pleasant, Wisconsin — described as the world's most powerful — runs hundreds of thousands of GPUs in a single interconnected cluster. Elon Musk's xAI built Colossus in Memphis, Tennessee, with 100,000 NVIDIA H100 GPUs, assembled in just 122 days, with plans to double to 200,000. Oracle's Stargate I campus hosts over 450,000 GB200 GPUs at 1.2 GW of power capacity. At the extreme end, xAI has outlined a roadmap toward 1 million GPUs.

Each of those GPUs needs power, cooling, and networking. A single rack of 72 GPUs consumes approximately 56 kW. Liquid cooling is now mandatory — air cooling simply cannot dissipate the thermal output. Every GPU requires 800 Gbps networking. The networking fabric, the power distribution, the cooling infrastructure, the structural engineering to support these loads — it all compounds.

For context: a 100,000-GPU facility requires roughly 1,400 racks of GB300 NVL72 systems. At 56 kW per rack, that is 78 MW of IT load alone, before cooling, networking, and facility overhead push total site demand well above 100 MW. A facility of that scale is no longer a technology deployment — it is an energy megaproject.

From Chip to Chatbot: How AI Training Actually Works

When you interact with Claude, ChatGPT, or Gemini, you are accessing a trained model. Training is the computationally intensive process that happened before you ever typed a prompt.

A large language model is trained by feeding it trillions of tokens — fragments of text from books, websites, and other sources. The model processes these tokens through billions of parameters (adjustable numerical weights) across layers of artificial neural networks. At each step, the model predicts the next token, compares its prediction to the actual data, calculates the error, and adjusts its parameters slightly. This cycle repeats trillions of times across thousands of GPUs working in parallel.

OpenAI used over 10,000 GPUs to train early versions of ChatGPT. Current frontier models require significantly more. The entire cluster must function as a single coordinated system — if one GPU fails or one network link underperforms, training efficiency drops across the entire operation.

Once trained, the model shifts to inference — the phase where it responds to your questions. Inference requires fewer GPUs per query but must serve millions of users simultaneously, demanding vast clusters optimised for throughput and latency rather than raw training compute.

This is the chain: silicon fabricated in Taiwan → chips packaged and assembled → integrated into server racks → installed in a purpose-built data centre → connected, powered, cooled, and commissioned → model trained over weeks or months → deployed to serve your 3-second query. Every link in that chain must work. Delay at any stage cascades through the entire programme.

What This Means for Programme Delivery

The facilities housing these chips are not conventional construction projects. They are industrial installations with power densities, cooling requirements, and commissioning complexity that mirror offshore platforms and LNG terminals more closely than traditional IT deployments.

"Every GPU that sits idle because the building around it isn't ready represents $50,000 of stranded capital — multiplied by tens of thousands." — /pmo

The gas turbines powering these facilities have three-to-five-year lead times. The chips themselves face supply constraints from a single fabrication source. The liquid cooling systems require specialist commissioning that most data centre contractors have never performed. And the operators — from Microsoft to Oracle to Qatar's emerging hyperscale market — are all competing for the same constrained pool of skilled labour, materials, and equipment.

This is where independent programme governance earns its value. Not in optimising IT configurations, but in ensuring that power, cooling, structure, and technology converge on schedule — because in a hyperscale facility, every week of delay costs millions.

The Hive Platform

PMO Hive's programme management discipline — built on energy megaproject delivery — is purpose-designed for this convergence challenge. From cost governance and schedule risk analysis to commissioning readiness and EPC oversight, the Hive platform ensures that the building is ready when the chips arrive.

Share this article

10.04.2026

Stargate UK on Hold: What OpenAI's Pause Tells Us About Delivery Risk

OpenAI has paused its Stargate UK data centre project indefinitely, citing high energy costs and regulatory uncertainty. It follows a 600 MW expansion cancellation in Texas and a strategic pivot away from self-built infrastructure. The pattern is clear: announced ambitions are colliding with delivery realities. The markets that win the next wave of hyperscale investment will be the ones that can prove they deliver — not just the ones that make the biggest announcements.

07.04.2026

Finland's Hyperscale Boom: Cool Climate, Hot Delivery Risk

Finland has more than 5 GW of hyperscale capacity in its pipeline and 41 facilities in development. But a 40-fold electricity tax increase, 52-week transformer lead times, and intensifying competition for construction labour are creating delivery risks that no amount of natural cooling can solve. The gap between announced and operational capacity is the real story.

22.03.2026

Gas Turbines Are Powering the Middle East's Hyperscale Ambitions

The grid gap is real. The turbine backlog is growing. And the Gulf has the gas, the capital, and the operational heritage to build what the AI economy needs — if programme delivery keeps pace with ambition.

10.04.2026

Stargate UK on Hold: What OpenAI's Pause Tells Us About Delivery Risk

07.04.2026

Finland's Hyperscale Boom: Cool Climate, Hot Delivery Risk

22.03.2026