Artificial intelligence failed the state exam: how the school testing system turns out to be deaf to reality

Oksana21.07.2025

0 0 11 minutes read

Зміст

Artificial intelligence, which is enthusiastically talked about in the world as the technology of the future, turned out to be unable to adequately pass the tests of external independent assessment (EXA) of Ukrainian schoolchildren. Algorithms get confused in texts, do not catch the context, do not choose the right answers in humanitarian subjects. And while machines “get tired” of complex formulas and logical connections, Ukrainian children, often without electricity, without stable Internet, with alarms and explosions outside the window, are forced to demonstrate the results on which their future depends. In this unfair balance of power, technology doesn’t seem so smart. However, the real question is not how imperfect artificial intelligence is, but the conditions under which Ukrainian schoolchildren are forced to take these tests. In a country at war, where education is often based on enthusiasm and endurance, standardized tests of knowledge remain deaf to reality.

Limits of artificial intelligence or educational absurdity

Modern language models of artificial intelligence are simply amazing with their multifunctionality: they write symphonies, codes, even answers in the messenger. But not a single model was able to pass the Ukrainian final exam. Even those that are born with trillions of parameters and master dozens of languages in a few days. Here is a breakthrough in artificial intelligence.

The results ZNO-Vision studies clearly demonstrated that language models, despite all their “intelligence”, are not able to work effectively with the material of the Ukrainian ZNO. The best of the systems tested, Gemini Pro, did not even reach the basic pass level of 70%. Others, including GPT-4o, performed even worse. At first glance, this looks like a technical limitation of the models, but if you look more closely, you can see a real indicator of how detached from reality Ukrainian evaluation standards are.

The tasks of the external examination cannot be perceived as a simple test of knowledge, they require a deep understanding of the context: language constructions, cultural references, the logic of wording, the specifics of Ukrainian school terminology. For example, even questions on the history of Ukraine do not require mechanical memorization of facts, but the ability to compare events with a wider European or post-Soviet context. Language or literature questions often contain stylistic or genre nuances that are not always transparent even to native speakers, not to mention machine models that were mostly trained on English-language data.

It is noteworthy that part of the models showed a good orientation in the structure of the questions after fine-tuning. For example, after adaptation, Paligemma began to work better with Ukrainian-language content. But even this was not enough to pass the test. And here a key observation is relevant: it is not about the quality of the models, but about the nature of the test itself. If even highly accurate systems trained on massive data corpora do not understand the logic of tasks, perhaps it is worth asking questions not of machines, but of the content and structure of the tools we use to assess teenagers.

However, the biggest concern is not the technical inability of AI, but the fact that these same tasks are completed year after year by Ukrainian schoolchildren studying in conditions of unstable access to education. Because of the war, many students do not have a stable educational environment, because some students are forced to study in shelters, others receive education remotely in several time zones, and some are completely without parents, housing, as well as the support of a qualified teacher, or are being treated after injuries. At the same time, the external examination system remains unchanged, as if external conditions do not matter.

This imbalance between form and reality, between expectation and the possibility of achieving it, creates the illusion of objectivity. Uniform tests declare equality, but in practice it does not work. Requirements remain universal, and access to training shows signs of deep inequality. When AI, which processes billions of parameters, passes the test for teenagers studying in extreme conditions, it is no longer about technology, but about a social diagnosis of the system.

In this situation, Ukrainian education finds itself in a paradox when, on the one hand, it tries to integrate modern approaches, digital solutions, new programs, while, on the other hand, its main assessment tools remain fixed, inflexible and unadapted to current conditions. The results of the study of language models not only prove the limits of AI, but also expose the internal contradictions of the Ukrainian educational system, which, despite all the pressure of external circumstances, still tries to evaluate according to the standards created for a stable country in peacetime.

Cultural blindness of artificial intelligence

A separate line of problems pointed out by researchers is the cultural insensitivity of language models. In the technical world, this is known as “bias”, or the bias of the algorithm, which arises as a result of the disproportionate representation of some cultural narratives and the lack of others in the training data. In the case of Ukrainian tests, this has very specific consequences.

For example, models do not recognize borscht as a Ukrainian dish or call it “Russian”. In literary tasks, models confuse real characters with fictional ones created on the basis of stylistic tracings. In history assignments, they lose the logic of events in the chronology, where it is Ukrainian markers that are important, not just dates and events, but their interpretation within the framework of the Ukrainian historical narrative.

The reason for this lies not in a technical malfunction, but in the fact that the Ukrainian language, culture and educational context are peripheral in the architecture of global artificial intelligence. Most language models are trained on corpora in which English-language content dominates, and Ukrainian content is minimally represented. As a result, even the most modern models, which demonstrate brilliant results in Western tests, turn out to be not just uninformed, but deaf to the Ukrainian context. Such a situation indicates rather not the technical problem itself, but concerns the issue of cultural presence. In the global algorithmic field, the Ukrainian context remains insufficiently described and, therefore, poorly understood by AI. This means that even in the presence of a large amount of data, the Ukrainian educational reality, together with its linguistic, historical and social features, loses the ability to be read and, accordingly, recognized.

An inequality that the test does not see

The very meaning of testing, which is usually perceived as a knowledge test, should be considered separately. But it is also a mechanism for the formal distribution of access: to higher education, opportunities and the future. VET in Ukraine was supposed to become an instrument of justice, a unified system, where the result depends on knowledge, and not on the place of residence, school or social status. But the war actually destroyed this equality.

Pupils who live in frontline regions, are under occupation or were forced to go abroad do not have access to stable education. Lessons are interrupted by anxiety, online learning is not equivalent to face-to-face learning, and the language of instruction is not always Ukrainian. Preparation for the external examination in such conditions turns into a personal marathon with survival. And yet, the tasks for everyone remain the same: complex, sometimes ambiguous, with an emphasis on interdisciplinary, deep logic and multi-level information processing.

This is probably the key contradiction. Formally, the system continues to declare equal conditions for all, but in fact it continues to maintain an institutional blindness to social and geographical differences. When an AI fails a test, we take it as an indicator of difficulty. But when a child who lived under shelling does not pass the same test, it is worth knowing that this is already the moral responsibility of the system, which did not foresee any alternative model of evaluation.

Statistics of the quality of knowledge of Ukrainian schoolchildren

For data According to the TIMSS international study, two-thirds of fourth-graders did not cope with the tasks of applying theoretical knowledge in real life situations.

According to the results of the 2010 external examination in mathematics, only 188 applicants out of 111,000 were able to get the maximum score. Moreover, 58% of the test tasks belonged to the “easy” category. In 2014, the tasks of the external examination tests on Ukrainian language and literature turned out to be difficult for most of the participants. Only 9 applicants out of 242,611 participants received the maximum score, in mathematics – 46, in biology and geography, not a single participant was able to show a decent result.

The wording of the tasks of the ZNO always raised questions. Even before the full-scale invasion, many of the tasks were incorrect, ambiguous, or tested more guesswork than knowledge. Especially in humanitarian blocks: language, literature, reading comprehension. The situation when the answer depends not on the text, but on the assumptions “what the author wanted to say” or “what the compiler may have meant” has nothing to do with objectivity.

Of course, no one objects: there are applicants who take the test for 200 points, but there is also a nuance in this. After all, we remember the experiments when the teachers themselves were invited to the external examination, and many of them did not pass the test or openly admitted that they were more nervous than at the state exam. So are the teachers poorly trained, or perhaps the problem lies in the tasks themselves?

Tasks of tests for ZNO and NMT have been causing serious concern in society for a long time. We all remember very well what a long way these tests traveled until they came to us in the form we have now. Over the years of the system’s existence, approaches to the development of test tasks have changed, tests from certain educational subjects have appeared and disappeared, information protection systems have been modernized, and new models for determining the results of external examinations have been developed and implemented.

From 2018 to today, the DPA in the form of an external examination on several key subjects is a mandatory condition for all schoolchildren and students of professional (vocational and technical) education and higher education institutions of I‒II levels of accreditation. The system aims to provide transparent and unbiased results, making entry possible for everyone. However, some tasks of the tests are surprising. Thus, in 2015, the external examination in Ukrainian language and literature forced applicants to go to a rally. The wave of indignation was caused by five tasks from an excerpt of the work of modern Ukrainian writer Halyna Pagutyak. According to applicants, the proposed answers to the questions were synonymous, and the correct answers were absurd.

“It was frightening that the ZNO was unprepared, there was not enough time, because the time allotted for the test was reduced that year, this had a great impact on our answers“, – recalls former applicant Nastya Spiridenko.

During the 2021 external examination, another scandal arose, caused by only one examination of open-answer tasks. Previously, such works were checked by two examiners in turn. Each of them gave their own assessment. If these grades were different, then the work was given to the senior examiner. But in that year, due to lack of funding, the Ukrainian Center for the Evaluation of the Quality of Education (UCEQA) was forced to hire examiners for only one inspection.

As it turns out, there is always insufficient funding for a high-quality assessment procedure, mistakes are made in the test tasks that the applicants do not cope with, and all around is heard, “but the teachers did not prepare the students well“. It has already happened that in our country it is customary to blame teachers for everything. And we do not understand that it is necessary to look at the root of the problem.

But the most unpleasant thing is that even when these shortcomings are noticed, it is practically impossible to change something. The structure of setting and checking tasks is extremely regulated, formalized, and most importantly, detached from real speech and practice. This is especially noticeable for those who have experience with international exams, for example, in foreign languages, such as IELTS or TOEFL. Everything is much simpler and more logical there: knowledge of vocabulary, grammar, ability to communicate. Not guessing on coffee grounds and not interpreting literary allusions without context.

In external examinations, everything is much more complicated: even reading tasks often require answers not on the basis of what has been read, but on the basis of what theoretically could be meant. In such conditions, the student must not only know the material, but also guess the compiler’s train of thought. And this is no longer a test of knowledge, but a test of psychological endurance. In addition, for some reason experts forget that all this is happening in a country where students often do not have stable access to education due to war, evacuations, lack of teachers, electricity or the Internet. In such a situation, every “incomplete” wording can no longer be perceived as some trifle, but an additional blow to the child’s capabilities.

So the main question remains open: if not only artificial intelligence, but also the majority of living, intelligent, motivated people can’t cope with the external examination, then probably the matter is not that “everyone is not intelligent enough”, but that the test has long been in need of an honest review. After all, the school program is overloaded. Students have to learn a large amount of material in a short period of time, which causes stress and overload. Because of this, schoolchildren do not always have time to properly prepare for exams within school hours.

In addition, the level of training provided at school often does not meet the expectations of students and their parents. In order to ensure a deeper understanding of the subjects and to better prepare for the external examinations, parents are forced to hire tutors. This leads to additional financial costs and efforts on the part of the family, and also creates inequality in training opportunities among students.

Every time there is a simplification or shortening of the tasks, which does not give an opportunity to really check the level of knowledge of applicants, and therefore creates false ideas in them about the future study and the choice of a higher educational institution. After all, such an approach is no help for entrants, but a real disservice. Everyone should understand the true value of knowledge and that in order to acquire it, one must try, and not put those crosses somehow at random.

In many countries of the world, standardized testing has long been no longer limited to choosing the correct answer from four options. For example, in Great Britain, the A-level system is based on deep analysis: the student chooses several subjects and composes them in writing, in the format of essays, analytical tasks and creative projects. This allows you to test not only knowledge of facts, but also the ability to think, argue, and evaluate. In France and Germany, an important role is played by oral exams, interviews and project work, which provide an opportunity to fully evaluate the actual thinking process, and not just the result.

In the US, the SAT or ACT, although standardized, contain logical, mathematical and verbal blocks that are not tied to a specific program, but are more aimed at general cognitive abilities. And more and more often, universities there are even abandoning the obligation of these tests, moving to a “holistic review” — a holistic approach that takes into account the context, motivational letters, school portfolio, social background, participation in public initiatives.

The Ukrainian ZNO is a system that is still based on the principle of one correct answer. It does not take into account that as the student thinks under what conditions he studied what exactly was available to him, and models an imaginary student with ideal access to a full school curriculum, with textbooks, tutors, a quiet home environment, and equal opportunities for preparation. In the conditions of war, such a model turns out to be not only archaic, but even traumatic.

The key problem of the Ukrainian VET does not lie in its form, but in the lack of flexibility. It is the same for everyone, regardless of whether the student studied offline, on Zoom, in a shelter or abroad. And in this there is neither adaptation nor logic, only automatism.

So, now the VET is perceived as a tool of accounting, and not of understanding, education in Ukraine will remain not a system of development, but a field of survival. Now the structure of the test requires accuracy, depth, abstract thinking and interdisciplinary integration from the student. And this is really good, but in the absence of equal starting conditions, such a structure turns from an evaluation tool into a serious barrier. In such circumstances, the question should not be how to adjust the GPT to the VTE, but how to change the VTE so that it is not oppressive to those who are less fortunate in their circumstances.

Oksana21.07.2025

0 0 11 minutes read