Meta Details Backend Aggregation Architecture Behind Prometheus Gigawatt-Scale AI Cluster

Meta's engineering team has published a detailed technical breakdown of Backend Aggregation (BAG), the centralized Ethernet-based super-spine network layer that connects the company's Prometheus AI cluster. Writing on Meta's engineering blog, authors Jalpa Patel, Ankur Singh, and Hany Morsy describe how BAG bridges two distinct L2 fabric technologies — Disaggregated Scheduled Fabric (DSF) and Non-Scheduled Fabric (NSF) — to interconnect tens of thousands of GPUs across multiple data center buildings within a single large region. When complete, Prometheus is designed to deliver one gigawatt of compute capacity, making it one of the most ambitious AI infrastructure deployments any company has publicly detailed. Inter-BAG bandwidth capacities reach 16 to 48 Petabits per second per region pair.

At the hardware level, the BAG layer is built on modular chassis equipped with Broadcom's Jericho3-AI (BCM88860) ASIC line cards, each providing up to 432 ports at 800 Gbps. The design employs two inter-BAG connectivity topologies: planar, which offers simplified management by connecting switches one-to-one between regions, and spread, which distributes links across multiple switches and planes to improve failure-domain resilience. Oversubscription is managed carefully, with a typical L2-to-BAG ratio of approximately 4.5:1 and an effective NSF oversubscription of 4.98:1. Routing runs eBGP with link-bandwidth attributes enabling Unequal Cost Multipath (UCMP) for dynamic load balancing, while MACsec encryption secures all BAG-to-BAG connections.

Meta's choice to build the Prometheus networking fabric on Broadcom merchant silicon rather than proprietary ASICs contrasts with Google, whose Jupiter network fabric — scaling to 13.1 Petabits per second of bisectional bandwidth and capable of connecting over 100,000 TPU chips — relies on Google-proprietary fabric ASICs and custom Optical Circuit Switches. Microsoft and AWS take a similar approach to Meta, depending on merchant networking components while directing custom silicon investment toward compute accelerators. Whether that split reflects pure economics or architectural preference is a question none of the three companies has directly answered.

The physical networking layer increasingly sets the practical ceiling for large-scale distributed training, and that constraint is shaping roadmaps across the industry. Broadcom's next-generation Jericho4, shipping on TSMC 3nm in early 2026, introduces HyperPort technology aggregating four 800 Gbps ports into 3.2 Tbps logical ports, explicitly targeting million-GPU clusters distributed across multiple data centers up to 100 kilometers apart — the trajectory Prometheus anticipates. The formation of the Optical Compute Interconnect (OCI) MSA, co-founded by AMD, Broadcom, and Nvidia alongside Meta, Microsoft, and OpenAI, signals that leading hyperscalers are converging on open Ethernet specifications rather than pursuing proprietary networking silicon. That alignment further entrenches Broadcom's position in hyperscale AI cluster networking.