‘Catastrophic overtraining’ could harm large language AI models that are trained on more data for the sake of training


- Researchers from top US universities warn extending pre-training can be detrimental to performance
- Too much pre-training can deliver worse performance due to something akin to the butterfly effect
- The more they are pre-trained, the more they become sensitive to small changes that could disrupt the end result
Researchers from Carnegie Mellon, Stanford, Harvard, and Princeton are challenging one of AI development’s accepted core beliefs – that the more pre-training data the better the performance.
As reported by HPCwire, a new paper discuses the concept of “catastrophic overtraining,” whereby extended pre-training can harm a model’s performance after fine-tuning.
The researchers compared two versions of the OLMo-1B model, one trained on 2.3 trillion tokens and another on 3 trillion. Despite the larger training set, the more extensively trained model reportedly performed up to 3% worse on benchmarks like AlpacaEval and ARC.
Reaching the inflection point
This performance drop, the study claims, is linked to a phenomenon called “progressive sensitivity.”
As the token count increases, the model becomes more fragile. Even small tweaks, like adjustments during fine-tuning, or the introduction of noise, can reverse earlier gains.
The authors demonstrated this by injecting Gaussian noise into pre-trained models, noting that performance degraded more sharply the longer the model was trained.
The point where this additional training starts to degrade performance is called the “inflection point.”
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Once reached, the benefits of training start to become outweighed by the risk of internal instability. The study found that this tipping point often occurs beyond 2.5 trillion tokens in smaller models, like OLMo-1B.
“Catastrophic overtraining may be inevitable… especially when the pre-training and fine-tuning tasks are misaligned,” the authors warn in their paper, which you can access through the arXiv pre-print server.
While the researchers are not suggesting an end to pre-training, they do feel that developers should consider just how much pre-training is enough. As the paper concludes, “Our findings call for a renewed focus on model scaling that considers the entire training pipeline.”
For AI developers chasing scale, the message seems clear: sometimes, less really is more.
You might also like
Researchers from top US universities warn extending pre-training can be detrimental to performance Too much pre-training can deliver worse performance due to something akin to the butterfly effect The more they are pre-trained, the more they become sensitive to small changes that could disrupt the end result Researchers from Carnegie…
Recent Posts
- ‘Catastrophic overtraining’ could harm large language AI models that are trained on more data for the sake of training
- Smartphone tariffs are coming back in ‘a month or two,’ says Trump admin
- How the Switch 2 compares to the ROG Ally
- Microsoft is digging its own grave with Windows 11, and it has to stop
- 12 Best Laptops of 2025, Tested and Reviewed
Archives
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010