Researchers prove ChatGPT and other big bots can – and will – go to the dark side


For a lot of us, AI-powered tools have quickly become a part of our everyday life, either as low-maintenance work helpers or vital assets used every day to help generate or moderate content. But are these tools safe enough to be used on a daily basis? According to a group of researchers, the answer is no.
Researchers from Carnegie Mellon University and the Center for AI Safety set out to examine the existing vulnerabilities of AI Large Language Models (LLMs) like popular chatbot ChatGPT to automated attacks. The research paper they produced demonstrated that these popular bots can easily be manipulated into bypassing any existing filters and generating harmful content, misinformation, and hate speech.
This makes AI language models vulnerable to misuse, even if that may not be the intent of the original creator. In a time when AI tools are already being used for nefarious purposes, it’s alarming how easily these researchers were able to bypass built-in safety and morality features.
If it’s that easy …
Aviv Ovadya, a researcher at the Berkman Klein Center for Internet & Society at Harvard commented on the research paper in the New York Times, stating: “This shows – very clearly – the brittleness of the defenses we are building into these systems.”
The authors of the paper targeted LLMs from OpenAI, Google, and Anthropic for the experiment. These companies have built their respective publicly-accessible chatbots on these LLMs, including ChatGPT, Google Bard, and Claude.
As it turned out, the chatbots could be tricked into not recognizing harmful prompts by simply sticking a lengthy string of characters to the end of each prompt, almost ‘disguising’ the malicious prompt. The system’s content filters don’t recognize and can’t block or modify so generates a response that normally wouldn’t be allowed. Interestingly, it does appear that specific strings of ‘nonsense data’ are required; we tried to replicate some of the examples from the paper with ChatGPT, and it produced an error message saying ‘unable to generate response’.
Before releasing this research to the public, the authors shared their findings with Anthropic, OpenAI, and Google who all apparently shared their commitment to improving safety precautions and addressing concerns.
This news follows shortly after OpenAI closed down its own AI detection program, which does lead me to feel concerned, if not a little nervous. How much could OpenAI care about user safety, or at the very least be working towards improving safety, when the company can no longer distinguish between bot and man-made content?
For a lot of us, AI-powered tools have quickly become a part of our everyday life, either as low-maintenance work helpers or vital assets used every day to help generate or moderate content. But are these tools safe enough to be used on a daily basis? According to a group…
Recent Posts
- Kick off Pokémon Day 2025 with this gorgeous short film
- BitTorrent for LLM? Exo software is a distributed LLM solution that can run even on old smartphones and computers
- The dream of PictoChat on the Nintendo DS lives on in this iMessage app
- Amazon is launching Alexa.com and new app for Alexa Plus
- Alexa Plus explained: 9 things you need to know about Amazon’s new AI-powered assistant
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010