Fact or manipulation: how the myth of AI’s victory over medicine was formed

Oksana23.05.2025

0 567 8 minutes read

Зміст

When the news that artificial intelligence is “better than doctors” begins to be widely discussed on social networks, it is almost always accompanied by emotional polarization. Some commentators announce the end of traditional medicine, while others indignantly defend professional ethics. Meanwhile, the media in various formats — from English-language techno-platforms to Ukrainian news sites — are actively spreading a new wave of messages: they say, for the first time in history, OpenAI’s large language models have shown diagnostic accuracy that exceeds the results of experienced doctors. Materials circulating in the media almost unanimously use formulations such as “for the first time in history, AI has surpassed doctors”, “the medical revolution is here”, “the first profession to be bypassed by an algorithm”. At the same time, questions are already being raised: will medicine now begin to lose doctors? Can this be considered the beginning of the elimination of medical error as a systemic phenomenon?

Why the statement “AI is better than doctors” affected the information space

Such statements arose out of nowhere, its source was a large-scale study conducted in the United States in one of the Boston clinics. There, artificial intelligence models (specifically, GPT-4o) were matched with doctors in a real clinical appointment scenario. The model worked in the conditions of a shortage of data and time – exactly as it happens in the first minutes of hospitalization of patients. It turned out that in the most stressful episodes, for example, in the intensive care unit, artificial intelligence gave more accurate diagnoses than even the most experienced specialists. The figures that are widely quoted in social networks and texts are as follows: 79.7% accuracy in the LLM model versus 75.9% in the team of doctors. Moreover, the key emphasis is not only on numbers, but conditions: time limit, minimum clinical information, critical condition of the patient. That is, precisely those situations in which mistakes are the most dangerous.

Several groups participated in the experiment: experienced medical practitioners, medical experts who evaluated the results independently, and the GPT-4o language model developed by OpenAI. The model did not give any advantages – it received the same condensed information as people. Patient records were simplified, without complete history or imaging data. The task for all was the following: to propose a diagnosis at the time of initial contact in the conditions of intensive care. In the most critical situation, when the doctor has only a few minutes and a minimal set of facts, the model turned out to be more accurate.

The accuracy of GPT-4o was 79.7%. This is an average of her ability to correctly establish the main diagnosis. For the best among the doctors involved in this project, the similar indicator was equal to 75.9%. The difference seems small, but in medical practice it is significant, especially at the stage of critical reception, where 3-4% can equal tens or hundreds of lives on a yearly scale. At the same time, the AI model not only provided a more accurate version of the main diagnosis, but also offered a wider range of possible alternatives with a deeper coverage of rare pathologies. In the context of differential diagnosis, which requires systems thinking, this turned out to be a key advantage.

It was fundamentally important that the researchers did not evaluate “all medicine”, did not measure communication skills, did not simulate a surgical intervention. The model did not talk to patients, did not study the nuances of contact with relatives, did not accompany treatment. It performed a single step: initial analytical processing of a minimal data set. But it is this step that is most vulnerable to human error and at the same time critical for making a clinical decision.

The results of the experiment became the basis for loud headlines and the conclusion that artificial intelligence “surpassed doctors.” However, it is worth distinguishing formal data from their interpretation layer. The study itself does not claim that GPT-4o is a substitute for a clinician. It is only a statistical advantage within a single task. However, within the information space, this local advantage was perceived as a violation of the border: the moment was recorded when the machine gave a more accurate answer than a person, not in a test, but in a hospital.

It was this moment that created the gap, but not in medicine, but in people’s ideas. The public reaction, which can now be observed in the media, social networks and even among doctors themselves, goes far beyond the facts. It is asserted not only the “victory” of the model in diagnosis, but essentially the beginning of competition between the system logic of the algorithm and clinical intuition. In the public discourse, this immediately raised difficult questions: who will be responsible in the event of an AI error? Can doctors use the model as a tool? Will they not become dependent on her at the most crucial moment? Will this not turn into a scheme of delegation of decisions without direct responsibility?

These questions do not yet have answers, but the research itself has already demonstrated: AI can process clinical information not only at the level of templates, but approaching expert logic. For the healthcare industry, this means that the further distinction between “algorithm support” and “medical decision co-author” will become a complex technical, ethical and economic issue.

Why the statement “AI has bypassed doctors” does not correspond to reality

Even a basic analysis of this statement shows that it is not only incorrect, but also simplifies and distorts the very essence of the medical profession. We should start with a basic clarification: in medical practice, making a diagnosis is not a choice from a ready-made list of options, as in a test. It is a process that involves collecting incomplete, often conflicting information, refining it, interpreting it, and constantly adjusting it based on changes in the patient’s condition. The doctor does not work with a right or wrong answer, but with unpredictability, where each decision takes into account dozens of factors: from somatic to behavioral. A diagnosis is the result of professional judgment, not an arithmetic summation.

The doctor works directly with the patient – his complaints, reactions, physical manifestations, which do not always end up in the medical record. In daily practice, the doctor often relies not only on the patient’s words or numerical indicators, but also on nuances that are not recorded in the documents: the intonation of the answer, the pause before it, the appearance of the skin, tongue or eyes, breath, micromovements of the face, anxiety or indifference in the gaze. All these signals provide additional information about a person’s condition, and are an important part of clinical thinking. No language model, including GPT-4, works with such data, because it does not have access to it and is not able to read or interpret it.

At the same time, the language model works only with the data provided to it. She does not take part in the collection of anamnesis, does not find out additional circumstances, does not check contradictory information. Instead, the doctor is forced to independently build a complete picture: ask clarifying questions, reformulate answers, take into account what the patient hides or considers unimportant. All this is the core of medicine, without this work there is no diagnostic base. That is why it is methodologically incorrect to compare a model that receives already collected data with a doctor who obtains it in real time. The model does not pass the stage that occupies most of the clinical process.

Another fundamental point is responsibility. Neither model makes decisions in the legal sense. She can formulate an assumption, but does not put a signature under it. In clinical practice, a diagnosis is not a hypothesis, but a responsible medical decision, which is recorded in official documents and has practical consequences for further treatment. The doctor who places it assumes full legal and ethical responsibility. In the event of an error, he is responsible to the patient, to the medical institution, and, if necessary, to the supervisory or judicial authorities. The algorithm remains outside the scope of this responsibility, even if its recommendation turns out to be wrong, no person or institution is obliged to be responsible for its consequences.

There is also a third component — ethical. In many cases, the doctor has to decide not between right and wrong, but between complex options, none of which is ideal. When the patient refuses treatment, when the family does not agree with the medical prognosis, when the chance of survival does not exceed a few percent, what is needed is not data analysis, but the ability to act in a situation of moral uncertainty. The AI model in such situations is incompetent, it does not read value boundaries, does not take into account the cultural or emotional context, does not adapt to situations that go beyond the limits of the algorithm.

Finally, it is important to directly indicate the conditions of the experiment that became the basis of this loud statement. It was laboratory-based: all participants worked with identically structured cases where the information was already formulated and logically completed. This has nothing to do with a real clinic, where the patient may be confused about the testimony, not answer questions, react emotionally, appear unstable or completely unconscious. The research scenario did not foresee these factors. Therefore, to draw from it the conclusion that AI has “surpassed doctors” is incorrect both from an ethical or legal, as well as a scientific point of view.

It should be noted that as of 2025, no country has given the language model the authority to make medical decisions independently. Algorithms, including GPT-4, Med-PaLM 2 or Gemini, do not have the status of subjects of clinical practice and are not responsible for diagnosis or treatment. All known regulators—the FDA in the US, the EMA in the EU, as well as national ministries of health—see such models only as supporting tools, nothing more.

In addition, artificial intelligence does not have a medical license, does not undergo certification, does not interact with the patient directly, and is not recorded in the documentation. His participation is limited to text analysis or prompting, but the final decision rests with the doctor. In the event of an error, he is responsible, not the author of the model or its developer. The companies that create these models (OpenAI, Google, Anthropic) clearly state that it is only a support tool, not a replacement for a doctor. However, this is often ignored in the information field, instead generalizations such as “AI is more accurate than doctors” are heard, although no clinic actually delegates to it the right to make decisions.

So, the fact remains: artificial intelligence does not work in the health care system as an independent medical entity. He does not have the right to act independently, is not recognized as a specialist and cannot bear legal responsibility. All the talk about him “surpassing” doctors is just a matter of interpretation and pretty news headlines to attract readers. Its accuracy in diagnostic hypotheses may be high, but this is not proof that the medical profession is no longer needed. He is and remains the only point where clinical responsibility, risk, ethics and the right to make a final decision converge.

Oksana23.05.2025

0 567 8 minutes read