Aiming to become the global leader in chip-scale photonic solutions by deploying Optical Interposer technology to enable the seamless integration of electronics and photonics for a broad range of vertical market applications

Free
Message: ETHERNETALLIANCE Blog REVOLUTIONIZING NETWORKING FOR AI WORKLOADS

https://ethernetalliance.org/blog/2023/04/24/guest-blog-ofc-2023-revolutionizing-networking-for-ai-workloads/

 

 

By Sameh Boujelbene • April 24, 2023

Technology

By Sameh Boujelbene

This guest blog is from Sameh Boujelbene, Vice President, Data Center and Campus Ethernet Switch Market Research for the Dell’Oro Group. It was originally published on the Dell’Oro Group Blog.

2023 witnessed a remarkable resurgence of the OFC conference following the pandemic. The event drew a significant turnout, and the atmosphere was buzzing with enthusiasm and energy. The level of excitement was matched by the abundance of groundbreaking announcements and product launches. Given my particular interest in the data center switch market, I will center my observations in this blog on the most pertinent highlights regarding data center networking.

 

 
The Bandwidth and Scale of AI Clusters Will Skyrocket Over the Next Few Years

It’s always interesting to hear from different vendors about their expectations for AI networks, but it’s particularly fascinating when Cloud Service Providers (SPs) discuss their plans and predictions regarding the projected growth of their AI workloads. This is because such workloads are expected to exert significant pressure on the bandwidth and scale of Cloud SPs’ networks, making the topic all the more astounding. At OFC this year, Meta portrayed their expectations of how their AI clusters in 2025 and beyond may look like. Two key takeaways from Meta’s predictions:

  • The size and network bandwidth of AI clusters are expected to increase drastically in the future: Meta expects the size of its AI cluster will grow from 256 accelerators today to 4 K accelerators per cluster by 2025. Additionally, the amount of network bandwidth per accelerator is expected to grow from 200 Gbps to more than 1 Tbps, a phenomenal increase in just about three years. In summary, not only the size of the cluster is growing, but also the amount of compute network per accelerator is skyrocketing.
  • The expected growth in the size of AI clusters and compute network capacity will have significant implications on how accelerators are currently connected: Meta showcased the current and potential future state of the cluster fabric. The chart below presented by Meta proposes flattening the network by embedding optics directly in every accelerator in the rack, rather than through a network switch. This tremendous increase in the number of optics, combined with the increase in network speeds is exacerbating the power consumption issues that Cloud SPs have already been battling with. We also believe that AI networks may require a different class of network switches purpose-built and designed for AI workloads.

(full article continued at weblink)

Share
New Message
Please login to post a reply