Listen to this post: The Next Wave of AI Hardware and Infrastructure (2026 to 2030)
AI runs in two very different places today. At one end, there are warehouse-sized data centres training huge models, gulping power and pushing heat into the sky. At the other, there are everyday devices doing quick jobs, like tidying photos, transcribing voice notes, or spotting spam.
The next wave of AI hardware and infrastructure matters now because we’re hitting hard limits. Electricity is costly, grid capacity is tight, and the old trick of “add more GPUs” doesn’t scale forever. At the same time, new uses need speed on demand, real-time assistants, factory robots, and cars that can’t wait for a round trip to the cloud.
This post breaks down what’s changing in chips, how data centres are being re-built around power and cooling, and what signals to watch from 2026 to 2030.
The chip shift: GPUs still rule, but new AI chips are rising fast
GPUs are still the main workhorse for training big models, mainly because they’re strong at the same kind of maths AI uses, and the software stack is mature. That won’t change overnight.
What will change is where the spend goes. Training chases scale, memory bandwidth, and fast links between thousands of accelerators. Inference (serving the model to users) chases low cost, low power, and predictable speed. Those needs pull hardware in different directions.
Here’s the plain-English split between the names people keep searching for:
- NVIDIA: dominant GPU platform, strong networking and packaging, the “safe default” for many buyers.
- AMD: serious challenger on performance and price, often attractive where supply and cost control matter.
- Google TPU: custom chips designed for Google’s own workloads and large pods, strong when your stack fits their style.
- Cerebras: wafer-scale systems built to keep more work on-chip, aiming to reduce memory bottlenecks in certain training jobs.
- Groq: built for fast, steady inference, often judged by latency and tokens-per-second rather than peak specs.
The practical takeaway is simple: from 2026 onwards, teams that match the chip to the job will outspend teams that buy “one thing for everything”.
Superchips, chiplets, and tighter CPU to GPU combos
A useful mental model: chipmakers are moving from “one big slab of silicon” to “a set of smaller tiles that act like one”. These are chiplets, and they matter because smaller pieces are easier to make, easier to source, and easier to mix and match.
Packaging also gets smarter. CPUs sit closer to GPUs, and memory stacks closer again. That reduces time wasted waiting for data.
Feeding data to a model is like keeping a busy kitchen stocked. If ingredients arrive late, the chefs (your GPUs) stand around doing nothing. Tighter CPU to GPU links and more on-package memory keep the kitchen moving.
Low-precision maths (FP4, INT4) and why it cuts cost
A lot of AI doesn’t need perfect detail in every number. Low-precision formats (like INT4 or FP4) store smaller numbers, using fewer bits. That means:
- less memory used per model,
- less data moved per response,
- less power burned moving it.
The benefit shows up as faster replies and more users served per chip. The trade-off is accuracy. Some tasks still need higher precision, especially parts of training, scientific workloads, or safety-critical checks. Where low-precision shines is inference, and parts of training where the model can tolerate “rougher” maths.
Beyond GPUs: the new architectures that could change the rules
Think of this phase as new toolkits, not a sudden replacement. Many of these options start in narrow roles, then spread if software, reliability, and support catch up.
Custom ASICs and in-house AI chips, built for one job
An ASIC is a chip built to do one job very well. The push for in-house chips is easy to understand: if you run AI at huge scale, unit cost and power use become business problems, not technical ones. Owning the silicon roadmap also reduces the pain of GPU shortages and long queues.
A concrete example: a video platform might build silicon tuned for video ranking and moderation, because those jobs happen billions of times a day. Even small efficiency gains add up.
Another tailwind is AI-assisted chip design, which can shorten design cycles by speeding up layout and verification work. It doesn’t remove risk, but it can reduce time-to-first-silicon.
Neuromorphic, analog, and in-memory compute for ultra-low power
Some chips try to copy how brains handle signals. Neuromorphic systems often process events, not constant streams, which can suit always-on sensing. If nothing changes, the chip stays quiet and sips power.
Analog and in-memory compute attack a different waste: data movement. In many AI systems, shifting data between memory and compute burns a lot of energy. If you can do part of the maths where the data already sits, you save power and time.
Early uses are likely to be practical and local: sensors, small robots, and industrial edge devices where watts matter more than raw throughput. This won’t replace GPU clusters for frontier model training soon, but it can shrink the cost of “AI everywhere”.
Optical and photonic computing, using light to move and multiply data
With electronics, heat rises as you push more data through wires. Photonics uses light, which can carry a lot of information quickly. The promise is simple: more bandwidth, and in some designs, less heat for certain matrix-heavy operations.
The near-term path is likely to be hybrid systems where optics does the heavy matrix maths and electronics handles control and general compute. The hurdles are also clear: software tools, reliability at scale, cost, and manufacturing consistency. For a grounded look at real products in this space, see Q.ANT’s photonic computing overview.
AI infrastructure is hitting a wall: power, cooling, networking, and where data centres go next
Chips get headlines, but infrastructure sets the speed limit. Rack density keeps rising, and air cooling often can’t keep up. Power delivery, heat removal, and networking now decide what can be built, and where.
Industry coverage has been blunt about it: the next generation of AI data centres has to prioritise power, cooling, and outage prevention, not just more servers. One useful starting point is this overview from Data Center Knowledge: The AI Infrastructure Revolution: Predictions for 2026.
Power becomes the bottleneck, so sites follow the grid
AI growth is colliding with grid limits. In practice, that pushes new sites towards places with stable capacity, predictable pricing, and faster connection timelines. It also changes how operators plan: longer contracts, more on-site backup, and tighter coordination with utilities.
A quieter trend is carbon-aware scheduling in plain terms, running flexible jobs when the grid is cleaner or cheaper. Training can often move in time, even if inference cannot.
Liquid cooling goes mainstream, from cold plates to immersion
Direct-to-chip liquid cooling uses cold plates on hot components, with coolant carrying heat away far better than air. It supports denser racks and reduces hotspots, which helps performance stay steady under load.
Immersion cooling goes further. Hardware sits in a non-conductive fluid that pulls heat away from the whole board. It can simplify some airflow design, but it brings new needs: plumbing, fluid handling, maintenance training, and careful safety checks.
The shift is not cosmetic. It changes building design, service routines, and even what skills data centre teams hire for.
Networking and storage for AI superclusters, 400G to 800G and beyond
When you tie thousands of accelerators together, the network stops being “the thing that connects servers” and becomes part of the computer. Slow links waste expensive compute time.
Expect faster fabrics (400G moving towards 800G and beyond) and more attention on network cooling too. Cisco has discussed this direction, including thermal design, in its note on liquid-cooled switches.
A related idea is disaggregation, treating compute, memory, and storage as shared pools. The goal is better utilisation, scaling without buying idle capacity, and upgrading parts without replacing everything.
From 2026 to 2030: what to watch, and how to choose what matters
The next wave won’t be “one chip wins”. It will be a constant sorting process: right workload, right hardware, right site design.
The new split: frontier training in mega-clusters, efficient inference everywhere
We’re heading into a two-track world:
- Frontier training: mega-clusters with dense accelerators, high-bandwidth memory, and tight networking.
- Efficient inference: tuned models on laptops, phones, cars, cameras, edge servers, and smaller cloud fleets.
This is where hardware-aware models grow fast: quantised, pruned, distilled, and built to fit the power budget. Electricity forces the issue, even when budgets don’t.
A simple scorecard: performance per watt, total cost, and time to deploy
Use a scorecard that a busy team can actually remember:
| What to check | Why it matters |
|---|---|
| Performance per watt | Sets running cost and how dense you can build |
| Latency and throughput | Defines user experience and server count |
| Memory bandwidth and capacity | Often limits real workloads more than compute |
| Network needs | Slow fabrics waste expensive accelerators |
| Cooling requirements | Decides rack density and building cost |
| Software maturity | Determines how much engineering time you’ll burn |
| Supply availability | Delays can erase any theoretical savings |
The main habit shift is to think in total cost, including power, cooling, space, and staff time, not just chip price.
Risks and friction: supply chains, geopolitics, and the refresh cycle problem
Hardware plans now sit inside politics and supply constraints. Export controls, local chip pushes, and scarce components can shape who gets what, and when.
Another pressure point is the refresh cycle. Chips feel old sooner, which creates knock-on effects: resale markets, e-waste, and planning for re-use rather than constant replacement. The smartest operators will design fleets so older hardware still earns its keep on lighter inference, testing, or batch jobs.
Conclusion
The next wave of AI hardware isn’t just “bigger clusters”. Chips will diversify, data centres will be redesigned around power and heat, and inference will spread into everyday devices. The clearest takeaway is that efficiency becomes the headline metric, not raw peak speed.
If you want a follow-up, choose your next read: a comparison table of NVIDIA vs AMD vs TPU vs Cerebras vs Groq, a deeper dive into liquid cooling, or a practical guide to photonic compute and where it fits.


