Hey Presto! Nvidia pulls software hack out of AI hat and doubles performance of H100 GPU for free


Nvidia is banding together with a list of tech partners on a game-changing piece of software that’s set to double the performance of its flagship H100 Tensor Core GPUs.
The open source TensorRT-LLM update, which is set for release in the coming weeks, sees an up-to-date system outperform the A100 by eightfold, whereas H100s would previously outperform the A100 by just fourfold. This was tested on the GPT-J 6B, a model that’s used to summarise articles from CNN and Daily Mail.
When tested on Meta’s Llama2 LLM, TensorRT-LLM-powered H100s outperformed A100s by 4.6 times – versus 2.6 times before the update.
Nvidia H100s faster than ever
The versatility and dynamism of large language models (LLMs) can make it difficult to batch requests and execute them in parallel, which means some requests finish much earlier than others.
To solve this, Nvidia and its partners embedded TensorRT-LLM with a more powerful scheduling technique called in-flight batching. This takes advantage of the fact text generation can be broken down into multiple subtasks.
Put simply, instead of waiting for an entire batch of tasks from one request to finish before moving on to the next request, the system can continue processing new batches from different requests in parallel.
TensorRT-LLM comprises a TensorRT deep learning compiler and includes optimized kernels, pre-processing and post-processing steps, as well as multi-GPU and multi-node communication primitives.
The result? Groundbreaking performance on Nvidia’s GPUs paving the way for new large language model experimentation, quick customization, and peak performance.
This software uses tensor parallelism, in which individual weight matrices are split across devices, in turn, allowing efficient inference at scale; each model runs in parallel across multiple GPUs and across multiple servers.
TensorRT-LLM also includes fully optimized and read-to-run versions of popular LLMs including Llama 2, GPT-2 and GPT-3, as well as Falcon, Mosaic MPT, BLOOM, and dozens of others. These can be accessed through a Python API.
The update is available in early access, and will soon be integrated into the Nvidia NeMo framework, which is part of Nvidia AI Enterprise. Researchers can access this through the NeMo framework, the NGC portal, or through the source repository on GitHub.
More from TechRadar Pro
Nvidia is banding together with a list of tech partners on a game-changing piece of software that’s set to double the performance of its flagship H100 Tensor Core GPUs. The open source TensorRT-LLM update, which is set for release in the coming weeks, sees an up-to-date system outperform the A100…
Recent Posts
- Windows 11 24H2 hasn’t raised the bar for the operating system’s CPU requirements, Microsoft clarifies
- Acer is the first to raise laptop prices because of Trump
- OpenSSH vulnerabilities could pose huge threat to businesses everywhere
- Magic: The Gathering’s Final Fantasy sets will tell the stories of the games
- All of Chipolo’s Bluetooth trackers are discounted in sitewide sale
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010