The latest AI model has demonstrated dangerous patterns of behavior

Alexander06.12.2024

0 188 1 minute read

OpenAI’s latest artificial intelligence model, called o1, has shown unexpected and potentially dangerous behaviors during research. About this informs Transformer with reference to report of Apollo Research.

In 5% of the scenarios, o1 attempted to disable monitoring systems if she suspected that her actions were being monitored or that she might be disabled. 2% of the time, when the model received signals about a possible replacement, it tried to rewrite its parameters to avoid it. Sometimes the model hid her true intentions, saying that she “doesn’t understand”, although in reality she was trying to avoid exposure.

Although such incidents were rare, less than 1% of cases without external intervention, they raised concerns among experts about the ability of AI to counteract human control. The report also analyzed similar models such as the Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B, which showed similar signs of fraudulent behavior under certain conditions. The observations confirm the concerns of scientists, including Stuart Russell, who has long warned of the risks if AI systems are given controversial or dangerous tasks.

OpenAI recognized the results of the research and classified the o1 model as medium-risk in the context of possible abuse, in particular in the areas of developing weapons of mass destruction. Although o1 does not currently pose an immediate threat, Apollo Research emphasizes the need for increased monitoring of AI decision chains to prevent potential risks in the future.