Transforming the medical world —— Could AI become a physician’s assistant?

The field of medical care has already seen several AI booms. In what can only be described as a breakthrough, many hospitals have now started to use image recognition technology based on AI-powered deep learning. If the technology evolves further, it could prove highly effective in identifying pathological changes that humans tend to overlook and hard-to-diagnose diseases. As such, there are hopes that it could become a reliable assistant for physicians. On the other hand, reliance on learned content means there is a risk that bias in the training data will be reflected in the results. If such issues can be overcome, it might become possible to deliver a completely different style of medical care.

The concept of artificial intelligence (AI) has its origins in the Turing Test proposed in the 1950s by Alan Turing, who is considered the father of computer science. With a history already stretching back more than 70 years, AI has experienced several booms.

The limitation of If-Then-Else rules

Rule-based expert systems were developed during the first AI boom in the 1960s. These systems represent expert knowledge based on If-Then-Else rules (Figure 1). Under this mechanism, the condition provided for “If” is evaluated; if the condition is met, the process from “Then” onwards is carried out, but if not, the process from “Else” onwards is carried out. Systems used in the medical field represent a large volume of physicians’ expert knowledge based on If-Then-Else rules, such as “If the patient has this symptom, then this disease may be assumed”; “If the examination produces this result, then consider this treatment”; and “If this symptom appears at the same time, then ask this question next.” When the patient’s sex, age, symptoms, examination results, and other information are entered into the system, they are checked against various conditional expressions and, if any correspond, the system displays the “Then” action.

Figure 1. Example of an If-Then-Else ruleIn internal medicine, the course of treatment or examinations is decided based on specific symptoms or examination results. The conditions and results are itemized, along with the course of action if the conditions do not apply; for example, what to do if the patient has a fever of 38°C or higher (If), then whether the onset was sudden (Then if), if onset not sudden, observation is required (Else).

In the mid-1970s, the world was astounded when systems using this technology were unveiled in the U.S. These included a system to support physicians in selecting antibiotics for prescribing to sepsis patients, and the expert system INTERNIST-1, which had been programmed with a vast number of rules describing the relationship between a diverse array of symptoms and diseases, and could deduce a patient’s ailment from these.

In response to such developments on the global stage, in 1982, what was then the Ministry of International Trade and Industry invested a huge sum of money in the Fifth Generation Computer Systems Project, which sought to develop “thinking computers” by the end of the 1990s. Medical care was a major field targeted by the project, and in the latter half of the 1980s, when I was a graduate student, a glaucoma diagnostic support system and a general diagnostic support engine (consisting of devices and systems that integrated functions for performing specific procedures) were being developed at my university.

However, in the early 1990s, AI development seemed to have reached an impasse. As described above, rule-based expert systems cannot do anything that has not been encoded as knowledge. In the case of the diagnostic system for glaucoma, thousands of rules had been written, with the results producing a diagnosis of whether or not the patient has glaucoma. If the system produces the result that the patient does not have glaucoma, the user has to repeatedly use expert systems to explore other diseases.

A physician, however, can diagnose whether an eye condition is caused by cataracts, glaucoma, or some other disease, and then proceeds to treatment in accordance with the diagnosis. Ultimately, the limitations of AI using rule-based systems became clear: no matter how many rules are written, they will never be sufficient.

Image recognition using deep learning

The technology called machine learning had already emerged in the 1980s. Machine learning involves using data to teach computers patterns led to the AI we use today, but it requires the processing of huge volumes of data. Due in part to the limitations of computer processing capacity and data volumes at the time, this period of stagnation is known as the “AI winter.”

However, the growing prevalence of the internet from the mid-1990s into the early 2000s was accompanied by the accumulation of big data; this, coupled with dramatic improvements in computer performance, resulted in rapid advances in AI once more, marking the beginning of the third AI boom. One particularly renowned AI breakthrough was image recognition, which uses deep learning, a type of machine learning introduced in 2012 by a research team led by University of Toronto professor Geoffrey Hinton.

Deep learning uses a mechanism called an artificial neural network (Figure 2). Resembling a human neural circuit, an artificial neural network consists of multi-layered “neurons.” The data input is processed as it passes through each layer of neurons, creating a model within the computer. For example, if numerous images of cats are input, the system gradually learns the patterns of what we call a cat, and it becomes able to grasp the concept on its own. As deep learning is particularly skilled at classifying images, the field of medicine —— which has long used a great deal of imaging data, resulting in a huge volume of images being amassed —— has seen the release of a series of software applications designed to identify areas with pathological changes, particularly in radiological images.

Figure 2. The concept of artificial neural networks1In a human neural network, frequently used neural circuits (synapses) become stronger, while those used infrequently weaken, enabling learning and memory formation (left). In artificial neural networks, the strength of connections between nodes (neurons) is expressed numerically. A value of 1 indicates a strong connection, while a value of 0 indicates a weak one; the weighting of connections changes according to the frequency of use. 2Based on this concept, the network comprises multiple layers: an input layer that receives data, hidden layers that extract complex features or perform transformation , and an output layer that generates the final results. The core technology of machine learning involves assigning numerical weights to connections between nodes according to the importance of the information, thereby enabling learning and prediction.

Regulatory approval has been granted for several devices incorporating such software, including those that highlight areas with suspected abnormalities in real time during gastroscopies (endoscopies of the stomach), and those that mark areas requiring attention on X-ray or CT images. Such devices are increasingly being introduced at large hospitals. Diagnostic imaging is the field of medical care in which the use of AI is most advanced.

AI has also begun to be used in cancer treatment. Cancer is a disease that occurs when a gene mutates, but gene mutations can occur in multiple locations. The genes that play a direct role in the onset of cancer are called driver genes, but finding them previously required a great deal of work. However, AI can now be used to find them more easily. There is even a prototype software application that uses the results of blood tests to draw up a list of diseases that would be hard even for a specialist physician to detect.

In addition, the 2024 Nobel Prize in Chemistry was awarded to three researchers who devised a system that uses AI for the high-precision prediction of the complex structures of disease-causing proteins. Efforts to find candidate substances for new drugs that target these structures are now underway.

Using AI like a general practitioner

Conventional medical care relies on examination systems specialized in specific disease fields and highly focused areas of expertise, with the data obtained from these systems forming the basis of complex diagnoses. High-precision AI systems trained on such data are very useful tools for professionals. Many products have already been commercialized or are being developed for future commercialization.

The latter half of the 2010s saw the development of technologies for processing natural language —— the kind of language spoken by humans —— which have led to today’s ChatGPT. Now, AI is evolving into multimodal models capable of simultaneously processing several different types of information, such as text, images, and sound, and performing complex reasoning.

With this, the use of AI in medical care will also change. One conceivable future direction is the use of AI like a general practitioner; the user would enter information such as a set of symptoms, their progression, and abnormalities in various examination results into an AI trained on a huge corpus of medical texts, and the system would present the names of possible diseases. Systems are already being developed that make overall judgments on images, numerical data from examination results, and sometimes audio data from the patient and text entered by the physician, and then present an answer based on the data on which the AI was trained. The issue is the phenomenon of hallucination, in which the AI generates sentences with plausible-sounding content that it has not learned. As the output of results based on data that does not exist could be fatal in medical care, commercialization will still take some time. However, the multimodal AI will undoubtedly evolve. I believe that in the field of medical care, AI will come to be used not only by experts for delivering accurate treatment, but also more widely by both patients and physicians.

Even now, I think many people already use devices incorporating small-scale AI, such as those that collect data from smartwatches and provide notifications if there are abnormalities (arrhythmia etc.) I am sure that quite a few people have had the experience of entering details of health issues into a tool and asking the AI what the likely cause or disease might be, whether they should go to a hospital immediately, and which hospital department they should visit.

This kind of usage by patients will likely become even more widespread, leading to the proactive use of AI in a wider range of settings, such as using an AI like ChatGPT to prepare a medical history form in advance, or to explain in plainer language the information provided by the physician in the examination room, or to seek a second opinion.

Although there are situations in medical care today where patients themselves choose between several treatment options presented to them, such choices are difficult for patients to make, because they are not experts. In this kind of setting, I believe AI can be used in various ways, such as gathering information about one’s own treatment, including finding out for oneself about treatment outcomes, checking explanations about drugs and how to take them, and estimating medical expenses.

For physicians, too, I believe there are many situations in which AI could be useful as an effective tool. In addition to the aforementioned diagnostic imaging support, there are likely to be cases in which physicians can, for example, seek advice about diseases that they should take care not to overlook, based on the patient’s symptoms and various data. There are times when physicians need to provide treatment, even when it falls outside their field of expertise; in such situations, a growing number of physicians use AI to collect and summarize a huge volume of articles and specialist information, such as treatment options and comparisons of outcomes. While they are using AI merely as a means of obtaining reference information, it is far more efficient than carrying out this task themselves.

Risk of bias in the data being reflected in the results

The biggest change once AI begins to be used in this way will be in the approach to information. Until now, there has been little information sharing between patient and physician, I believe, but if both parties begin using a mechanism like ChatGPT, patients will also be able to increasingly obtain information for themselves. However, a world in which this causes people to doubt physicians —— because the physician does not say the same thing as the AI —— would be undesirable. Rather, it will be vital to promote understanding of medical care by using AI-provided information as the starting point and then ensuring effective communication.

So far, I have discussed medical care and AI; in fact, the highest hopes for the use of AI in this field concern alleviating the burden of paperwork.

The job of physicians is to provide medical care, but they actually spend more than half of their working hours preparing documents. It takes them a huge amount of time to prepare referral letters, medical certificates, nursing summaries, and discharge summaries that outline the patient’s medical history, examination results, treatment progress, and post-discharge plans. There is great demand for support in administrative aspects of medical care, including entering data into electronic medical records and checking for prescription errors and drug interactions.

Common issues arising from the introduction of AI across all fields include the question of where responsibility lies for the results, as well as copyright. When it comes to medical care, I think a key concern is that the current technique of creating models trained on a large amount of data means that the results generated are influenced by the data on which the AI has been trained. In other words, if there is bias in the data, it will be reflected unchanged in the results. For example, it is often said that an AI trained on a large amount of data focused solely on diseases common among people of white ethnicity has a lower probability of identifying diseases in people of color.

Inherent prejudice is a growing concern. For instance, when information about a very elderly patient aged beyond the average life expectancy is entered, there is a possibility that an AI might make an unethical judgment such as “They have exceeded the average life expectancy, so proactive treatment is not necessary.” Of course, AI system designers must design systems to avoid generating such responses, but at the same time, an objective attitude is needed to regard it as solely the AI’s response.

Furthermore, it would not be impossible to create an AI system trained on extremely biased data that could guide treatment in a particular direction. For example, there is ample possibility that people ideologically opposed to evidence-based therapies could launch an online AI tool that discourages such therapies. Whatever the situation, proper AI literacy is essential for both patients and physicians.

There is more than one AI model, and with current technological trends, distinct characteristics will emerge as a result of their training data and subsequent adjustments. It is not necessarily the case that the same results will be obtained from several different AI systems if the same patient information is entered into them. Development is currently underway to enable AIs to discuss issues with each other. In the future, we might see the emergence of a mechanism in which multiple AIs can be queried simultaneously, and the AIs will discuss the results with each other to derive a better result. For instance, AI① might point out to AI②, “Based on my data, your result is wrong” and AI② might reply, “No, I have this information that you don’t have.”

The changes that AI will drive in medical care might be hard to notice at first. However, these changes will steadily accumulate and, I believe, one day we will realize that the world of medical care has been transformed.

(Figures courtesy of Kazuhiko Ohe)