Researchers discover new method to bypass security filters of ChatGPT and Gemini chatbots

Researchers at the University of Idaho and the University of Illinois have published a new technique that allows bypassing security filters in large language models (LLM), including ChatGPT and Gemini. About this reported edition of 404 Media.
As part of their research, the scientists found out that such chatbots are capable of providing prohibited information if the request is formulated in a complex or ambiguous manner, as well as if fictitious sources are cited. This approach was called “information overload”.
To implement the attack, the InfoFlood tool was used – a program that automates the process of submitting a large amount of information, as a result of which the model loses the ability to correctly recognize the content and can issue data that is usually blocked by security filters. The vulnerability lies in the fact that the model focuses on the surface form of the text without analyzing its deeper meaning, which allows bypassing built-in protection systems and obtaining dangerous information.
In accordance with the principles of responsible disclosure, the authors of the study intend to share their findings with companies that develop large language models to help improve security mechanisms. They will also share the discovered method of solving the problem.
“LLM models primarily rely on input and output protection mechanisms to recognize malicious content. InfoFlood can be used to train these protection mechanisms — it allows you to extract relevant information from potentially dangerous queries, making the models more resilient to such attacks.”, says the study.