Daniel Kvak: So that self-diagnosis is not a road to hell...

With the public availability of tools like Midjourney or ChatGPT, the topic and ideas about the possibilities of using artificial intelligence are beginning to permeate the general public. Meanwhile, those who fell for the fascination with smart algorithms a little earlier are implementing projects that a few years ago were only portrayed in science fiction stories. The autonomous recovery modules familiar from films such as Alien or Elysium may only really be imaginable on screen, but in diagnostics, for example, AI has experienced a steep rise in the last few years. As it turns out, it can save time and money, which can play a significant role in the face of challenges related to an aging population and a shortage of medical personnel. One of those who have joined the digital health race is Mgr. Daniel Kvak. A student of the Faculty of Arts at Masaryk University, where he is working on the use of machine learning in audio-visual, but also the driving force behind the company Carebot, under whose name he is breaking into medical practices. The rise of the ambitious start-up is as precipitous as the cadence with which the young entrepreneur spouts medical concepts indistinguishable from those of a medical professional, and what began during the COVIDU-19 pandemic with a system for classifying lung X-rays is now scaling to other disciplines. One of the projects that Carebot's growing team is working on is a tool for recognizing skin melanomas, especially in patients with darker skin tones. "I was surprised that when Google released the DermAssist app, they didn't take much account of skin tone, when I think it's the most important factor," Kvak observes. After all, the need to learn from mistakes and be as practical as possible is emphasised several times in the interview. That his efforts make sense is confirmed, among other things, by his recent award at the MICAD conference, which focused specifically on computer systems that help with the interpretation of medical images.

24 Mar 2023

No description
No description

Do you - or did you - enjoy science fiction?
Very much! Whether it's movies or comics, like The Matrix or The Terminator. That's actually where my interest in artificial intelligence came from.

Don't you feel that you're helping to shape a certain vision of science fiction?
To a certain extent, yes, because I myself looked at the combination of AI and medicine as science fiction until a few years ago. But today it makes a lot of sense to me. It is a specific field with enormous potential, and especially with the transition from AI itself to applied medical practice, I have a huge amount of respect for it. But at the same time, I believe that a certain science fiction detachment helps and inspires me.

You started your business in the field during the COVID-19 pandemic, when you developed a tool for recognizing X-ray images of the lungs. What was the impetus that led you to delve into the health sector in conjunction with IT from a student in the Faculty of Arts interested in music? Did you feel that you also needed to make a contribution during the pandemic, when slogans about joint efforts were resonating, or did you also perceive a hole in the market from a business perspective?
I acknowledge that the transition can look strange, and doctors sometimes try to humorously point it out. (smiles) I must mention the work of my wife Karolina, who worked at the Faculty of Science at Masaryk University in anthropology and also worked in the morphology and forensic anthropology laboratory. We were thinking about how to combine our two interests, i.e. artificial intelligence within imaging methods, until we came up with the design of a system that would detect bone age estimation using ossification methods. Then when COVID-19 came along, we thought we'd try to help and put together the first neural network model to detect patients with findings of covid pneumonia, but also other pathologies. A little while after we published it, the first doctors started writing to us saying that it was a fine piece of work, but that it needed a lot of refinement to be used in medical practice. So we set about improving it together.

Going back on the timeline, how did a student at the Faculty of Arts learn how to program and create artificial intelligence models?
I originally made my living creating background music for TV commercials, and when I came to the point that it was a job that could be automated to some extent, I started learning how to use neural network models and AI. (smiles) Of course, that's not the case in medicine, where the processes are much more complex. When I supervise bachelor's or master's theses at the Faculty of Arts today, I notice that students see AI all around them and are interested in how it can be used within communication, within content generation, within image processing and so on. It's fascinating to me and I'm glad that the AI wave, even though it has its drawbacks, is bringing that additional education.

“Although projects with a huge impact are being created on campus, some things take a bit of time and could sometimes be pushed a bit. That's why start-ups and spin-offs are important, to transfer know-how into practice.”

Mgr. Daniel Kvak

Bones, lungs, now melanomas and other projects - the progress of your start-up Carebot is pretty meteoric...
We are moving in a segment that is highly competitive, and it is said with similar start-ups that when you start, you are already two to three years behind the others and have to sprint. Even so, we have to make sure that the methods and software we design are robust, clinically validated, approved by regulatory authorities and so on. Now, I don't want to make this sound bad, but even though there are projects being created in academia with huge impact, some things take a little bit of time and could be pushed a little bit at times. That is why start-ups and spin-offs are important, to transfer know-how into practice. What was difficult for us in the beginning was that although we had some initial know-how, none of us had radiological knowledge and we went in a bit blind. Today we have an in-house radiologist on the team and we work with more than thirty radiologists from all over Europe.

Before we get to melanoma, what is the state of development of the lung detection system, and what other projects have you started in the meantime?
Currently, our system primarily detects acute findings such as consolidations in the lung parenchyma, pneumothorax, effusions, or lesions or tumours. It is currently undergoing certification and we are fully implementing it in the so-called PACS systems that physicians use to archive imaging documentation. This is crucial: we try to adapt to the medical workflow and not change it. We are then working on other solutions as part of the scaling process, for example for mammography. And even though we are a de facto commercial entity, we also try to maintain a level of scientific activity, so we are also actively publishing.

How are melanomas specific to you?
Melanomas are an interesting segment for us because there is a high degree of possible automation. With chest X-ray, you have to take into account aspects such as the hospital, the radiologist, the description, PACS and others, which form quite a complex system. In contrast, a finding on the skin you are able to take a picture on your mobile phone and do some self-diagnosis yourself. I say "some" because self-diagnosis is a road to hell in my opinion, but it has a certain level of importance.

You have focused on detecting melanomas in people with darker skin tones, which is not well accounted for by existing similar systems. I was initially puzzled by where the Czech company wanted to find enough background material of dark-skinned people to teach your system, but then I read in the project that you wanted to use a so-called generative model to do this. Can you explain that concept?
Generative models have been with us for a few years now, although people haven't paid much attention to them except for deep fake videos or deep fake images. They work on the basis that we present data to AI and we don't want it to classify it (like self-driving cars recognising the colour of a traffic light), but we want it to generate new unique data. So, we present information to such a neural network in a so-called unsupervised learning framework and ask it to generate data bearing similar features to the presented one. It can look up the features itself or we can nudge it. This is currently a popular segment also thanks to projects like DALL-E or Midjourney, where we enter a text command and expect an output.

“Artificial intelligence must be a tool like a stethoscope or a ruler. It just has to provide another point of view and at the same time the doctor has to get to know it, learn to respond to it.”

Mgr. Daniel Kvak
No description

So the procedure is that you take a set of real images of melanomas, you feed them to the artificial intelligence, it generates a set of artificial images of melanomas corresponding to the given parameters, and then you let the real doctors evaluate them, thus further improving the data and the system...?
Exactly! There is even more variability in the images of skin findings than in lung findings, and it's not just about size or shape. The single most important factor is skin tone, and I noticed that when Google Health released their DermAssist app last year, they didn't take much notice of it. While their system does work with a large number of samples of people with fair skin, the darker the skin tone, the lower the representation of those samples. I was surprised by this because to reduce the error rate of these systems, the data needs to be robust, balanced, and not favour one population, even if it has a higher incidence of melanoma.

If you have populated your system with roughly 7,000 real photographs in the initial run, and then generated more data based on that, and then generated more data based on that, isn't there some dilution in quality? I'm reminded of when Meta inaugurated its chatbot, which learned from conversations with users who asked it what it thought of the company's founder, Mark Zuckerberg, and it didn't paint a very flattering picture of him from the available sources and subsequent discussions of the answers. Zuckerberg was never able to come out of those conversations in a good light after that...
We trained our own classifier with those first seven thousand images. We then used the DALL-E model, which was public at the time, for about two weeks. So, we created some text commands, based on which we generated the first data, which we had evaluated by Dr. Březina (Eva Březina, M.D., Ph.D. from the First Dermatovenerology Clinic of the Faculty of Medicine and the St. Anne's University Hospital in Brno) and our own classifier to tell us whether the images contained typical features of melanoma. In the second phase, we are now working with so-called outpainting, where we are already trying to match specific findings to images of dark skin tones. Previous models based on so-called generative networks, which work with a known distribution of data, have in the past tended to do what you mentioned. That is, they did work with datasets that they tried to modify in some way to create, say, unique versions, but they were always locked into one universal distribution and couldn't work with anything outside. DALL-E, however, introduced a model trained on so-called web scraping, or on data downloaded publicly from the Internet. True, it's not very ethical, but it allows it to work with context and generate more accurate outputs.

However, your colleague Matěj Misař, in an interview for E15 magazine last year, was sceptical about public data...
It's worth mentioning here that while there are all sorts of self-diagnosis apps available on Google Store and elsewhere, we are creating a certified medical device that goes through a complex regulatory process, clinical evaluation and other necessary phases where we verify how we arrived at our data. Publicly available datasets published by various institutions at the same time as the description may not always be accurate. Companies like Google collect datasets of different lesions from different sites around the world and have a team of doctors who agree on those findings - by visual appearance, by touch, by histopathological analysis and so on. We tend to try to complement these datasets within these processes so that they are balanced and representative. Obviously, we cannot do histological analysis for the generated images, we cannot touch them, and we cannot examine their size, but we are able to create a dataset that will have certain melanoma-specific features. In doing so, we have to start from clinically validated data and then look for ways to 'help' them and work with them further.

It is said that AI is only as smart as its creators, but listening to you, models like DALL-E collecting contextual data from the internet go beyond such claims, don't they?
To a certain extent, yes. It's probably too early to say definitively, but I would say we're moving towards that. I've been following all the generative models and I notice that, for example, ChatGPT is already starting to be used by people instead of Google... Specifically within medicine, context is extremely important to us, and we've been approaching it from the beginning in a way that AI needs to be a tool, much like a stethoscope or a ruler is a tool. It just has to provide another point of view and at the same time the doctor has to get to grips with it, learn to respond to it. Only when this happens, and when it involves clinical trials that are robust, multicentre and that include a diverse representation of patients, only then will we be able to say that even within clinical practice, artificial intelligence, capable of operating in workplaces that have never seen data before, will perhaps begin to move beyond that.

“Errors and inaccuracies arising from a misunderstanding of practice can lead to poor outcomes.”

Mgr. Daniel Kvak
No description

You've already broached the subject of AI bias, which you're actually trying to eliminate. In the United States, there was a recent case where it was found that doctors using AI were diagnosing black patients as healthier than they actually were because the system also assessed aspects that were not so much to do with the health condition itself, such as health insurance reimbursement. Related to this is the issue of the reliability of these systems...
This can also be illustrated by an example from the 1990s, when decision support systems for mammography screening started to be used in America, and doctors were financially incentivised to use these systems. They were not yet neural network-based models, but support vector machines, and they helped to have both greater sensitivity and better specificity, helping to detect patients early and saving considerable money. But the problem was in the set-up of these systems, where doctors just looked on and trusted them blindly, because they were claimed to have a sensitivity of between 96 and 97 per cent. And so the problem arose. That is why the subject of decision support systems is still very sensitive. Many commercial projects have failed because they did not go down the route of proper clinical validation. That's not to say that new start-ups - let alone us - are doing things that much better, but I think that knowing the previous problems and the level of scrutiny helps to make it really robust. At the end of the day, even clinical validation and evaluation is very careful today.

What is your ultimate ambition with the melanoma classification system? Getting it into surgeries or maybe even into mobile phones so that everyone can self-diagnose?
I am, of course, well aware that many dermatologists are terrified of such systems. Not because they should lose their jobs because of them, but because self-diagnosis is a very complex process. It can be rather difficult to evaluate a particular finding from visual information alone. One doctor beautifully mentioned that such systems can benefit doctors by not having people call them every time, even when the benign nature of the finding is quite obvious, but on the other hand, at the slightest suspicion, you need to approach the finding comprehensively, feel it, take a sample and so on, which no app can replace. That's the way to approach it. If the future brings some kind of multimodal approach, where we are able to understand and automate the different parts more comprehensively, that will take many years. For now, we are happy to be in the image data phase and able to make at least some initial estimates.

How then, in the future, can we prevent doctors from treating systems like this really as a tool, from getting lazy and relying on them too much?
These systems can help us with automation, but we need to find the right form of automation. We need to figure out which problem can be automated and when the clinical impact on the patient will be as high as possible. If I take our chest image classification system, which is already being tested in a hospital in Havířov, for example, we are trying to look for benefits that vary not only from one department to another, but also from one doctor to another. For example, a young doctor is sure of a certain number of findings and the rest must consult a more experienced colleague. Simply put, our system can help him not to have to go to the head doctor every five minutes and ask about every common thing for a more experienced doctor. Taking mammograms as an example, the prevalence of suspicious findings is quite low, or a high percentage of the findings of regular screenings are either completely negative or benign. But of course the doctor has to go through them, compare them to the history of the patient in question and evaluate them. This can be automated to some extent, which would give the physician more time for the scans that require more of that attention. Although even this could be a problem if he got used to the normal images. But we're actively trying to address it with doctors in an effort to minimize risk.

The last question will be a lighter one, and I'll return to the beginning of our conversation with an arc. As a fan of science fiction movies, what do you think about the fact that most of the ones featuring artificial intelligence end badly?
(Laughs) It's not just the movies that end badly, but sometimes the companies that are involved in artificial intelligence. Because mistakes and inaccuracies stemming from a misunderstanding of the practice can lead to bad outcomes. And in medicine, the consequences of such mistakes are far more visible than recommending a green sofa instead of a red one. We must learn from projects that have failed. I think that's the most important part at all: to understand the practice we're trying to integrate AI into, to see if it's appropriate for it at all, and to figure out what specific role it should play in it. All this while involving practitioners.

More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.

More info