Search
Logo
Follow
Subscribe
Logo
Subscribe

Jun 7, 2026

AWS ditches fat tree routing with new resilient network graph

AWS says its new Resilient Network Graphs architecture delivers one-third more throughput from 69% fewer routers.

AWS ditches fat tree routing with new resilient network graph

Amazon has begun deploying a new data center routing architecture called Resilient Network Graphs across most new AWS facilities, replacing the fat tree topology that has been the standard approach in large-scale data center networking for roughly two decades.

The company says the architecture delivers 33% more throughput from 69% fewer routers, and projects a 40% reduction in network infrastructure electricity consumption. AWS made it the default for most new data centers in April, and says three deployments are already carrying production traffic.

To understand why this matters, it helps to understand what fat tree routing is and why it has persisted so long. Fat tree is a hierarchical design, originally developed for supercomputing in the 1990s, that routes packets by moving them up and down layers of switches.

It scaled well enough in the early cloud era, but its fundamental problem is that maintaining high throughput as networks grow requires adding switch infrastructure in proportion to that growth. At today's data center scales, particularly those built around AI training workloads that generate enormous east-west traffic between servers, the hardware requirements become expensive and the opportunity for congestion increases.

The theoretical alternative has been known for years. Random graph topologies, in which switches connect to each other in a flat mesh rather than a hierarchy, are more efficient and more fault tolerant.

An early academic proposal called Jellyfish outlined this approach in 2012. The problem was always practical: implementing a random graph in a physical data center requires impossibly complex cabling between switches at varying distances, and demands that each switch hold a routing table large enough to describe every possible path in the network. Neither constraint was seen as manageable at scale.

Amazon's contribution is a build decision that makes the random graph approach operational rather than theoretical. Its researchers developed a routing algorithm called Spraypoint, which sprays traffic randomly across neighboring switches to take advantage of multiple available paths, then routes packets via designated waypoint switches using a conventional shortest-path algorithm as they approach their destination.

The more important hardware innovation is a device called ShuffleBox, which concentrates the complex inter-switch wiring that random graph topologies require into a single physical unit, eliminating the long cable runs that made previous implementations impractical.

The efficiency gains AWS claims are significant in the context of the industry's current constraints. Cloud providers are facing growing opposition to data center expansion on the grounds of power and water demand, and the cost of grid capacity has become a real limiting factor in siting new facilities.

A 40% reduction in network infrastructure electricity consumption, if it holds at scale, meaningfully changes the operating economics of a new data center and reduces the headline power draw that draws regulatory and community scrutiny. AWS has not had these figures independently verified, but the fact that the architecture is already running production workloads in multiple facilities provides more credibility than a lab claim alone.

The architecture is proprietary, which limits its near-term industry influence. AWS designs most of its own networking hardware, and the capital required to redesign and re-equip an existing data center with RNG is prohibitive. Amazon is applying it only to new builds for that reason, and outside observers have noted that most hyperscale customers and competing cloud providers are unlikely to absorb comparable redesign costs.

The result is that RNG validates the core idea that random graph networking is achievable at scale, but the approach will likely remain an AWS-specific advantage for the foreseeable future rather than a shift that propagates quickly through the broader data center industry.

Stay in the loop!

  • Subscribe to Uplink for free
  • Follow us on LinkedIn

Keep reading


As agent use grows, Cisco targets the token budget problem

Jun 7, 2026

As agent use grows, Cisco targets the token budget problem

Cisco is building observability and control tools across every layer of the AI stack to help enterprises manage token consumption.

Read More
arrow-square-up-right
VoidZero acquisition gives Cloudflare control of the JavaScript build stack

Jun 7, 2026

VoidZero acquisition gives Cloudflare control of the JavaScript build stack

The deal gives Cloudflare direct control over tooling used by millions of JavaScript developers.

Read More
arrow-square-up-right
Megaport expands into storage, targeting AI and backup workloads

Jun 7, 2026

Megaport expands into storage, targeting AI and backup workloads

Megaport's storage launch, combined with its Latitude.sh acquisition, is an attempt to compete with hyperscalers.

Read More
arrow-square-up-right
T-Mobile uses AI to adapt network capacity during live events

Jun 5, 2026

T-Mobile uses AI to adapt network capacity during live events

Dynamic CX monitors publicly available event data to pre-position network resources before large crowds arrive.

Read More
arrow-square-up-right
Google and IBM expand AI agent partnership

Jun 4, 2026

Google and IBM expand AI agent partnership

Google Cloud and IBM are building a shared portfolio of vertical AI agents, targeting banking, telecom, retail, and other sectors

Read More
arrow-square-up-right
Load more

AI

As agent use grows, Cisco targets the token budget problem

Cisco is building observability and control tools across every layer of the AI stack to help enterprises manage token consumption.

M&A

VoidZero acquisition gives Cloudflare control of the JavaScript build stack

The deal gives Cloudflare direct control over tooling used by millions of JavaScript developers.

Storage

Megaport expands into storage, targeting AI and backup workloads

Megaport's storage launch, combined with its Latitude.sh acquisition, is an attempt to compete with hyperscalers.

AI

T-Mobile uses AI to adapt network capacity during live events

Dynamic CX monitors publicly available event data to pre-position network resources before large crowds arrive.

AI

Google and IBM expand AI agent partnership

Google Cloud and IBM are building a shared portfolio of vertical AI agents, targeting banking, telecom, retail, and other sectors

Business

Networking and AI demand drive HPE to earnings beat

A record $10.7 billion quarter and surging networking orders give HPE the numbers needed to defend the Juniper acquisition.

DevOps

Microsoft brings Linux command line utilities to Windows 11

Coreutils reflects Microsoft's sustained effort to position Windows as a first-class platform for software development

AI

Intel bets on power efficiency with new data center chips

Intel's first major data center releases under new CEO Lip-Bu Tan signal a deliberate shift away from competing on raw performance.

Emerging

Forward Networks launches Predict to verify changes before deployment

Forward Predict runs proposed configuration changes against a mathematically modeled replica of the production network.

Jun 7, 2026

AWS ditches fat tree routing with new resilient network graph

AWS says its new Resilient Network Graphs architecture delivers one-third more throughput from 69% fewer routers.

AWS ditches fat tree routing with new resilient network graph

Amazon has begun deploying a new data center routing architecture called Resilient Network Graphs across most new AWS facilities, replacing the fat tree topology that has been the standard approach in large-scale data center networking for roughly two decades.

The company says the architecture delivers 33% more throughput from 69% fewer routers, and projects a 40% reduction in network infrastructure electricity consumption. AWS made it the default for most new data centers in April, and says three deployments are already carrying production traffic.

To understand why this matters, it helps to understand what fat tree routing is and why it has persisted so long. Fat tree is a hierarchical design, originally developed for supercomputing in the 1990s, that routes packets by moving them up and down layers of switches.

It scaled well enough in the early cloud era, but its fundamental problem is that maintaining high throughput as networks grow requires adding switch infrastructure in proportion to that growth. At today's data center scales, particularly those built around AI training workloads that generate enormous east-west traffic between servers, the hardware requirements become expensive and the opportunity for congestion increases.

The theoretical alternative has been known for years. Random graph topologies, in which switches connect to each other in a flat mesh rather than a hierarchy, are more efficient and more fault tolerant.

An early academic proposal called Jellyfish outlined this approach in 2012. The problem was always practical: implementing a random graph in a physical data center requires impossibly complex cabling between switches at varying distances, and demands that each switch hold a routing table large enough to describe every possible path in the network. Neither constraint was seen as manageable at scale.

Amazon's contribution is a build decision that makes the random graph approach operational rather than theoretical. Its researchers developed a routing algorithm called Spraypoint, which sprays traffic randomly across neighboring switches to take advantage of multiple available paths, then routes packets via designated waypoint switches using a conventional shortest-path algorithm as they approach their destination.

The more important hardware innovation is a device called ShuffleBox, which concentrates the complex inter-switch wiring that random graph topologies require into a single physical unit, eliminating the long cable runs that made previous implementations impractical.

The efficiency gains AWS claims are significant in the context of the industry's current constraints. Cloud providers are facing growing opposition to data center expansion on the grounds of power and water demand, and the cost of grid capacity has become a real limiting factor in siting new facilities.

A 40% reduction in network infrastructure electricity consumption, if it holds at scale, meaningfully changes the operating economics of a new data center and reduces the headline power draw that draws regulatory and community scrutiny. AWS has not had these figures independently verified, but the fact that the architecture is already running production workloads in multiple facilities provides more credibility than a lab claim alone.

The architecture is proprietary, which limits its near-term industry influence. AWS designs most of its own networking hardware, and the capital required to redesign and re-equip an existing data center with RNG is prohibitive. Amazon is applying it only to new builds for that reason, and outside observers have noted that most hyperscale customers and competing cloud providers are unlikely to absorb comparable redesign costs.

The result is that RNG validates the core idea that random graph networking is achievable at scale, but the approach will likely remain an AWS-specific advantage for the foreseeable future rather than a shift that propagates quickly through the broader data center industry.

Stay in the loop!

  • Subscribe to Uplink for free
  • Follow us on LinkedIn

Keep reading


AI

As agent use grows, Cisco targets the token budget problem

Cisco is building observability and control tools across every layer of the AI stack to help enterprises manage token consumption.

M&A

VoidZero acquisition gives Cloudflare control of the JavaScript build stack

The deal gives Cloudflare direct control over tooling used by millions of JavaScript developers.

Storage

Megaport expands into storage, targeting AI and backup workloads

Megaport's storage launch, combined with its Latitude.sh acquisition, is an attempt to compete with hyperscalers.

AI

T-Mobile uses AI to adapt network capacity during live events

Dynamic CX monitors publicly available event data to pre-position network resources before large crowds arrive.

AI

Google and IBM expand AI agent partnership

Google Cloud and IBM are building a shared portfolio of vertical AI agents, targeting banking, telecom, retail, and other sectors

Business

Networking and AI demand drive HPE to earnings beat

A record $10.7 billion quarter and surging networking orders give HPE the numbers needed to defend the Juniper acquisition.

DevOps

Microsoft brings Linux command line utilities to Windows 11

Coreutils reflects Microsoft's sustained effort to position Windows as a first-class platform for software development

AI

Intel bets on power efficiency with new data center chips

Intel's first major data center releases under new CEO Lip-Bu Tan signal a deliberate shift away from competing on raw performance.

Emerging

Forward Networks launches Predict to verify changes before deployment

Forward Predict runs proposed configuration changes against a mathematically modeled replica of the production network.

Policy

FCC pushes harder on spectrum deployment with EchoStar deal

The FCC attached strict buildout requirements as it approved $40 billion in spectrum transfers to SpaceX and AT&T.

Product

Palo Alto folds CyberArk into broader identity platform

The new Idira platform extends privileged access controls to machine identities, workloads, and AI agents.

Business

Cisco layoffs reflect the AI reshaping underway across tech

Even amid strong earnings growth, Cisco says it needs a leaner structure to compete in the AI market.

Not all loops are bad. Uplink keeps you in the ones that matter.

Uplink is free, weekly newsletter covering the business of enterprise networking.

Explore





© 2026 Uplink.
Report abusePrivacy policyTerms of use
beehiivPowered by beehiiv