The Chef Who Had to Wait
Imagine a world-class chef who can prepare 100 dishes per hour. The kitchen is state-of-the-art, the chef is lightning fast — and completely idle most of the day.
The problem? The pantry is slow. Ingredients arrive at a rate that supports only 30 dishes per hour. The chef spends most of the day standing around, waiting.
This, in essence, is the Memory Wall — one of the most important and least talked-about limits in modern computing.
What Is the Memory Wall?
Every computer has two key components: the Processor (CPU/GPU) — the brain that does calculations — and Memory (RAM) — the pantry that stores data the processor needs.
Over the past few decades, processor speeds have grown dramatically, roughly doubling every two years. But memory speed has grown much more slowly. The gap between the two has widened so far that processors now spend a large portion of their time simply waiting for data to arrive from memory.
One-sentence version: The processor is so fast that it finishes its work and must sit idle, waiting for the next batch of data from memory — like our chef waiting for ingredients.
Why Can’t We Just Make Memory Faster?
This is where physics steps in. When engineers design computer chips, they face a fundamental geometric constraint: computing power grows with chip area, but memory connections can only grow with chip perimeter.
Think of it like a city. If you double the city’s area (more buildings = more computing), you don’t double the number of gates in the city wall (perimeter = memory connections). The wall grows much more slowly than the city.
| Chip Area | Computing Power | Memory Speed | Gap |
|---|---|---|---|
| 1× | 1× | 1× | — |
| 4× | 4× | 2× | 2× gap |
| 9× | 9× | 3× | 3× gap |
| 100× | 100× | 10× | 10× gap |
Every time computing power quadruples, memory speed only doubles. The gap keeps widening — by the laws of geometry, not by any engineering failure.
Why AI Makes This Far Worse
Modern AI systems don’t just encounter the memory wall — they crash into it at full speed. Consider what a large AI model actually requires:
A large model might need 10 TB/sec of data movement while the hardware can deliver only 1 TB/sec. That’s a 10× shortfall — one full “order of magnitude.” Some training workloads require 100× more data movement than available bandwidth.
This is why AI systems are expensive to run, consume enormous electricity, and require specialized hardware — it’s not just computation that’s costly, it’s the constant movement of data.
Amdahl’s Law: Why More GPUs Don’t Always Help
You might think: “Fine — let’s just add more GPUs!” But a classic principle called Amdahl’s Law explains exactly why this doesn’t fix the problem.
Amdahl’s Law (Simply Put)
The speed of a system is always limited by the part that cannot be sped up.
If 10% of a task must happen sequentially — one step at a time — then even with infinite processors handling the other 90%, you can never exceed a 10× speedup. You hit a hard ceiling.
This is exactly what happens in many real AI workloads today. The memory wall turns data movement into the new “serial bottleneck” that Amdahl’s Law describes — except this time it’s a hardware limit, not a software one.
What Is Being Done About It?
Engineers and researchers are attacking the memory wall from multiple directions simultaneously. Here are the most promising approaches:
High Bandwidth Memory (HBM)
Memory stacked directly on the processor chip using 3D packaging. Eliminates the long data journey across a circuit board. NVIDIA’s H100 achieves over 3 TB/s — 60× faster than traditional RAM.
Processing-in-Memory (PIM)
Instead of moving data to the processor, small processors are built directly into the memory chip. The data never has to travel at all. Early commercial products are beginning to appear.
Smarter Caching
Small ultra-fast memory banks (caches) built onto the processor chip store frequently used data. Modern AI chips use much larger caches and smarter prediction algorithms to reduce idle waiting time.
Sparsity & Compression
If a model doesn’t need all its parameters for every task, why load them all? Pruning removes unnecessary parameters; quantisation uses lower-precision numbers — dramatically reducing data movement.
Optical Interconnects
Transmitting data using light instead of electrical signals. Light travels faster and doesn’t generate heat, potentially solving both the memory wall and power wall simultaneously.
Near-Memory Architectures
Chip designs like Cerebras and Graphcore use massive chips with enormous on-chip memory, so data never leaves the chip. Trades conventional processor flexibility for raw memory bandwidth.
Why Everyone Should Care
The memory wall has consequences that reach far beyond chip design labs:
Energy & climate: A significant portion of data centre electricity is spent moving data, not computing. Solving the memory wall could dramatically reduce AI’s carbon footprint — a problem that grows with every new model.
AI accessibility: The expense of running large AI models is partly a memory problem. Better memory architecture could make AI faster, cheaper, and accessible to smaller organisations and developing economies.
The end of “just build faster chips”: For decades, computers doubled in power every two years (Moore’s Law). The memory wall — alongside the power wall — is one of the key reasons that era has ended. Simply building faster processors no longer automatically produces faster systems.
The Big Takeaway
We have built processors of extraordinary speed — but we haven’t built equally fast highways to deliver data to them. The processor sits at the end of a slow road, waiting. This is the memory wall. And it explains, more than almost any other concept, why adding more GPUs doesn’t always make AI faster, why data centres consume enormous power, and why the next frontier in computing is not just about making processors faster — but about rethinking how data moves.
The chef is brilliant. The kitchen is magnificent. We just need a better pantry — and a much faster delivery system.
