‘No one knows yet’: Donut design could create quadrillion-transitor compute monster — analysts discuss unusual interconnection as Cerebras CEO acknowledges that we don’t know what happens when multiple WSEs are connected


Tri-Labs (comprised of three major US research institutions – the Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratories (SNL), and Los Alamos National Laboratory (LANL)) has been working with AI firm Cerebras on a number of scientific problems, including breaking the molecular dynamics (MD) timescale barrier.
There’s a paper explaining this particular challenge, which you can read here, but essentially it refers to the problem of conducting molecular dynamics simulations on a larger timescale than would normally be possible.
The barriers here are twofold: computational power and communication latency between different nodes of an HPC system. Traditionally, to compensate for the lack of computational power, scientists assign more work to each node and scale up the simulation size with the node count. Unfortunately, the slow inter-node communication caused by high latency further exacerbates the timescale problem.
Like a donut
MD simulations are crucial to several scientific fields as they bridge the gap between quantum electronic methods and continuum mechanics methods. However, these simulations encounter timescale limitations, as they have to account for atomic vibrations, which take place over very short timescales, and other phenomena that occur over much longer periods.
The authors of the paper sought to overcome the timescale barrier by employing a more efficient computational system, specifically Cerebras’ Wafer-Scale Engine.
As The Next Platform explains, “The specific simulation was to beam radiation into three different crystal lattices made of tungsten, copper, and tantalum. In these particular simulations, which were for 801,792 atoms in each lattice, the idea is to bombard the lattices with radiation and see what happens.”
Running the simulations on Frontier, the world’s fastest supercomputer based at the Oak Ridge National Laboratory in Tennessee, and on Quartz at LLNL, scientists were only able to witness nanoseconds of what was happening to the lattices as they were bombarded with radiation. Using WSE, they were given tens of milliseconds of time to watch what happened.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
For the tests, Tri-Labs used Cerebras Wafer Scale Engine 2 (WSE-2), rather than the newer, and more powerful WSE-3 launched earlier this year, but as detailed above the results were impressive. As the paper reports, “By dedicating a processor core for each simulated atom, we demonstrate a 179-fold improvement in timesteps per second versus the Frontier GPU-based Exascale platform, along with a large improvement in timesteps per unit energy. Reducing every year of runtime to two days unlocks currently inaccessible timescales of slow microstructure transformation processes that are critical for understanding material behavior and function.”
The Next Platform’s Timothy Prickett Morgan asked Cerebras CEO and co-founder, Andrew Feldman, what happens when you connect multiple wafer scale engines together and try to run the same simulation and was told “no one knows yet”.
Prickett Morgan went on to note, “The proprietary interconnect in the WSE-2 systems could scale to 192 devices, and with the WSE-3, that number was boosted by more than an order of magnitude to 2,048 devices,” but he “strongly suspects that the same scaling principles apply to WSEs as apply to GPUs and CPUs.”
He went onto suggest, however, that there could be some way to lash WSEs together physically, and make a “stovepipe of squares of interconnected WSEs,” potentially creating a donut design with power running on the inside and cooling on the outside. As Prickett Morgan concludes, “This kind of configuration could not be worse than using InfiniBand or Ethernet to interlink CPUs or GPUs.”
More from TechRadar Pro
Tri-Labs (comprised of three major US research institutions – the Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratories (SNL), and Los Alamos National Laboratory (LANL)) has been working with AI firm Cerebras on a number of scientific problems, including breaking the molecular dynamics (MD) timescale barrier. There’s a paper explaining…
Recent Posts
- Razer’s new Blade 18 offers Nvidia RTX 50-series GPUs and a dual mode display
- I tried adding audio to videos in Dream Machine, and Sora’s silence sounds deafening in comparison
- Sandisk quietly introduced an 8TB version of its popular portable SSD, and I just hope they solved its previous big data corruption issue
- iPhones are briefly changing ‘racist’ to ‘Trump’ due to an iOS dictation issue
- We finally know who’s legally running DOGE
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010