Microsoft has discovered a decrease in the reliability of ChatGPT and Gemini in long conversations
AI-based chatbots are less reliable and more likely to make mistakes during long conversations, according to a joint study by Microsoft Research and Salesforce, published in Windows Central.
The study analyzed more than 200,000 conversations with state-of-the-art language models, including GPT-4, Gemini, Claude, and DeepSeek.
According to the analysis, the models achieved a 90% success rate when executing a single query. At the same time, in multi-stage dialogues, this figure drops to 65%.
Despite the fact that the overall functionality of the models is reduced by about 15%, their level of unreliability increases by 112%. Even models with advanced capabilities for additional “thinking”, such as o3 and DeepSeek R1, also face similar difficulties.
The researchers named several reasons for the decline in the quality of answers:
- Premature generation – models try to formulate an answer before the user has finished explaining the task.
2. The “foundation” effect – the system relies on the first answer as a basis for subsequent ones, even if it contained errors.
3. Expanding responses – in long dialogues, the amount of text increases by 20–300%, which increases the number of assumptions and so-called hallucinations, which are later fixed as part of the context.
At the same time, artificial intelligence technologies are increasingly influencing the security sphere and global politics.
It was previously reported that the Pentagon is urging developers to create AI systems without “moral constraints” so as not to lose ground in technological rivalry.
The competition between the US and China in the field of neural network development is also intensifying. Both countries did not support the international declaration on the responsible use of AI in the military sector.
Special attention is also paid to the practical use of technologies, in particular in military conflicts. In particular, the Pentagon is analyzing the Ukrainian experience of using drones with the support of artificial intelligence on the battlefield.




