An explosion in the development of artificial intelligence (AI) has the potential to change the landscape of global eye health. The US Food and Drug Administration has already approved some applications, and there are more on the horizon.
But is AI currently useful as a tool to support real world clinical decisions, and are clinicians and consumers accepting of its use?
Associate Professor Peter van Wijngaarden explained the potential and current limitations of AI in healthcare at the recent Bayer Global Retinal Network Program meeting.
If you’ve ever taken a book recommendation from Amazon, a film or TV show suggestion from Netflix, used Google Maps, an email account, social media or online banking, then you’ve already used artificial intelligence (AI).
AI has been around since the 1950s – it’s not entirely new, but recent advances in the AI field have led to renewed interest in the application of this technology in healthcare.
Ophthalmology is becoming a leading proving ground for AI systems in medicine
As with many emerging technologies, the potential for disruption to established work patterns and practices has been the source of wide ranging conjecture and concern. Several highly qualified experts have weighed into the debate. In 2016, Professor Geoffrey Hinton, a computer scientist widely regarded as a leader in the field, and oft labelled as the “father of deep learning”, was quoted as saying, “it’s obvious that in five or perhaps ten years, AI will be far better than human radiologists…They should stop training radiologists now”.1 The statement was regarded as provocative, even incendiary, at the time, but it has been echoed more recently by other experts including Vinod Khosla, founder of Sun Microsystems.2 It is worth taking a moment to consider this statement from the perspective of an ophthalmologist.
It is arguable that the diagnostic repertoire of a radiologist is significantly broader than that of an ophthalmologist, so, if it is true for radiologists, then shouldn’t it be true for eye care professionals as well?
WHAT IS AI?
AI is an umbrella term for computer intelligence – aspects of human learning and intelligence that can be simulated by a computer. We are currently in an era of narrow AI, where AI systems are skilled at one specific task, or a very narrow set of closely-related tasks. The concept of generalised AI – artificial intelligence systems capable of meeting or exceeding human performance across a diverse range of tasks – long popularised in sci-fi movies – remains a distant prospect and one that many experts believe will never be achieved.
Machine learning is a subfield of AI. In classic machine learning, a set of features are directly measured from data, such as a medical image. Systems are trained to recognise features in the image data and classify the image on the basis of these, using large data sets that have been labelled by human experts. Machine learning can be highly effective, but it is dependent on hand engineering of features in the data.
Deep learning is a subset of machine learning, based on the concept of neural networks. Deep learning systems have computational architectures that are designed to simulate human decision-making, with layers of processing generating progressively more sophisticated feature detectors. Importantly, these systems are capable both of feature extraction and classification – they are not dependent on pre-programmed rules for feature detection. Accordingly, deep learning systems are not constrained by conventional feature recognition and classification. They are optimised to discover intricate structures within large, highly dimensional data sets and are therefore well suited to image analysis.
AI IN OPHTHALMOLOGY
Ophthalmology is becoming a leading proving ground for AI systems in medicine with AI systems already achieving subspecialist level expert performance under certain circumstances. In April 2018, the US Food and Drug Administration (FDA) approved the use of a deep learning system developed by IDx for the detection of referable diabetic retinopathy (DR) and diabetic macular oedema (DME) from retinal photographs.3 This was noteworthy as it was the first approval of the application of AI in the autonomous diagnosis of disease in any field of medicine.
machines are increasingly being tasked with promoting wellbeing and minimising harm, as well as distributing the wellbeing they create and the harm they cannot eliminate
The system was FDA approved on the basis of a prospective study of 900 people with diabetes from 10 locations across the United States.3 Participants underwent non-mydriatic (2-field) photography and these images were used for the AI system classification. Study participants then underwent optical coherence tomography (OCT) and mydriatic photography and these images were classified by expert human graders. AI and human expert gradings were compared and FDA approval was granted as the AI system exceeded stringent pre-specified sensitivity and specificity levels (>85% and >82.5% respectively).3
Does this approval relegate diabetic retinopathy grading to AI systems forevermore? Not yet. The FDA approval is presently applicable to retinal photos acquired using a particular make and model of retinal camera and only for the distinction between referable and nonreferable DR, as opposed to more granular severity stratification. The approval is also only valid for those without a prior diagnosis of DR.
This approval highlights the regulatory challenges of keeping abreast with the rapid rate of change in technology. In order for the FDA to approve this AI system for clinical use, the algorithm had to be locked. This means that the system no longer has the capacity to “learn” from additional data. AI system development for eye diseases has extended beyond retinal photography to other imaging modalities, including OCT. Arguably one of the leading examples of this comes from a collaboration between clinicians and scientists from Moorfields Eye Hospital in London, and experts in deep learning at DeepMind. In a paper recently published in the journal Nature Medicine, the team describe the development of a deep learning system that is capable of classifying over 50 retinal disease types and triaging the urgency of referral.4 The team adopted an innovative approach by developing two deep learning networks – the first to segment the image and generate a tissue map which is then used by the second network for disease classification. This approach means that the system is well suited to OCT images from different manufacturers, as a relatively small number of images are needed to retrain the segmentation network for new image sources.
The system was trained using 887 manually segmented OCT images and 14,884 labelled tissue maps for which the diagnosis and referral decisions were known. System performance was compared with four retinal subspecialists and four optometrists from Moorfields Eye Hospital. The AI system classifications were based on the OCT images alone, whereas the ophthalmologists and optometrists had access to OCT images, fundus photos, and clinical notes. The ground truth for disease classifications and referral urgency were based on expert panel consensus informed by knowledge of actual clinical outcomes. The AI system performance was equivalent to the best performing of the retinal subspecialists and was superior to the other human graders.
The dramatic increase in the sophistication of imaging tools and functional measures that are available to eye healthcare providers mean that every patient is becoming a ‘big data challenge’.5 Clinicians presently have access to more data than they can evaluate and interpret, and many healthcare professionals are of the opinion that AI systems have great potential to help them tackle this challenge to improve the quality of patient care.
How will individuals and society deal with medical errors made by machines?
Surprisingly, little is known of clinician perceptions of AI in eye health care. The largest published study of clinician acceptance of AI to date comes from a survey of radiologists and radiology trainees in France.6 The three main perceived advantages of the implementation of AI in radiology were the reduced risk of medical error; decreased time required for image interpretation, and as a consequence, more time to spend with patients, improving the quality of clinical interactions. Notably, almost two thirds of radiologists felt that the use of AI would not reduce the number of radiologists needed in future.
One potential barrier to the adoption of AI systems for medical image analysis is the challenge of understanding how the systems work – the so called ‘black box’ problem. If a clinician cannot understand the basis for a system-generated clinical decision, they are unlikely to accept it. In response, AI developers are testing a range of innovative solutions to this problem, such as saliency maps that flag areas of the image that contribute to the final classification, or histograms that display the probabilities of different disease classifications. One recent study tested the impact of visualisation aids, including saliency maps and histograms, on diabetic retinopathy photographic grading performance of retinal specialists and general ophthalmologists.7 As expected, during the early stages of the use of these tools, image grading times were prolonged, but with experience their use was associated with improved grading speed and sensitivity. Interestingly, for images without retinopathy, saliency map use was associated with higher false positive rates – as salient features of a ‘normal’ image may be misconstrued as features of disease.
Other developers, including the People and AI Research (PAIR) division of the Google Brain team, are building AI ‘partnerships’ – systems that enable clinicians to test the aspects of a given clinical case that they consider to be important. This enables the clinician to test the impact of a given variable on the system decision and test their own hypotheses. This field of research is still maturing, but it is likely to be of central importance to the widespread adoption of AI in healthcare.
What do we know of consumer acceptance of AI in healthcare? To answer this question it is helpful to consider consumer attitudes to healthcare more broadly. Overwhelmingly, healthcare consumers are placing increasing emphasis on the importance of convenience, choice and cost, and digital technologies are offering solutions on a number of these fronts. As a consequence, digital health technologies are increasingly posing challenges to the fiduciary relationship between patient and doctor that has long been the cornerstone of healthcare. An emerging model is the primary healthcare relationship between patient and system, with health practitioners serving as stewards of this relationship.
The 2018 Deloitte Healthcare Consumer Survey8 highlighted that consumers are increasingly open to new channels of care. For instance, 35% of people surveyed were interested in using a virtual assistant to identify symptoms and direct them to a physician or nurse; 31% were interested in utilising remote health coaches or using apps that detect depression or anxiety; and 51% surveyed were comfortable using an at-home test to diagnose infections before going to the doctor for treatment. The survey found that 41% were comfortable using at-home genetic tests and the same number were comfortable using at-home blood tests to connect to an app to monitor overall health trends. Interestingly, most consumers were willing to share their health data with their doctors to improve the quality of their care. Willingness to share data with other external parties was lower, but there is a historical trend for increased openness to data sharing. Respondents with chronic health problems reported a greater willingness to share their health data than were respondents without health problems, presumably as the personal value proposition is more proximate for the ill. Generational influences on technology adoption were also interesting – millennials were most likely to adopt digital health technologies.
Performance metrics for these systems are typically impressive – high sensitivity and specificity are more the norm than the exception
What about consumer acceptance of the use of AI in ophthalmology? A recent study tested the implementation of a deep learning system for diabetic retinopathy screening in Australian endocrinology outpatient clinics.9 Ninety-six participants underwent automated screening prior to seeing their endocrinologist. AI system generated reports were provided within minutes of retinal photography and images were also sent off for remote expert grading, the results of which were provided to each participant and their treating doctor in a written report approximately two weeks after the screening visit. Of participants, 78% reported a preference for AI screening over manual screening, and the most common reason stated for this preference was the immediate availability of a report, enabling retinopathy findings to be considered by their endocrinologist during the same appointment.
THE ETHICS OF AI
As the implementation of AI in healthcare is accelerating, so too is debate regarding the ethics of AI. A large and highly innovative study of the moral decision making of autonomous vehicles has a number of important parallels to autonomous decisionmaking in medicine.10 The study authors highlighted that “machines are increasingly being tasked with promoting wellbeing and minimising harm, as well as distributing the wellbeing they create and the harm they cannot eliminate”. They designed a number of online interactive driving scenarios with forced choice responses to interrogate the morals of decision making by machines, with the intent of informing policy makers who will soon grapple with the regulation of this technology. The online survey went viral, generating 40 million decisions from respondents in 233 countries and territories around the world. Participants were asked to respond to 13 simulated driving scenarios by selecting the most appropriate response for an autonomous vehicle – to stay on course and kill pedestrians or to swerve, killing passengers. The scenarios tested a number of preferences – fewer versus more casualties; passengers versus pedestrians; humans versus animals; the influence of gender, age, fitness, social status and crossing legally versus jaywalking.
Respondents had the strongest preferences for sparing humans over animals, for sparing more than fewer characters and for sparing the young in preference to the elderly. At an individual character level, the strongest preference was for saving babies, there was a slightly stronger preference for girls over boys, and a similar preference for sparing pregnant women. Respondents fell into three clusters with distinct moral preferences: a Western cluster, an Eastern cluster and a Southern cluster. Importantly, geographically and culturally proximate countries had similar preferences. The authors of the study highlight that autonomous decision-making should be informed by moral preferences and that these may vary considerably by region and culture – it is not simply a case of ‘one size fits all’.
There are a host of additional ethical considerations in the application of AI in medicine that need to be addressed, particularly in the area of autonomous decision making. For instance, what if a clinician disagrees with a decision generated by an AI system that has been proven to be highly accurate? In doing so, is the clinician reducing system performance and potentially compromising outcomes for the patient? How will the relationship between a doctor and a patient evolve in future? Will it be compromised, or will it be improved as more time is freed up for meaningful human interactions? How will individuals and society deal with medical errors made by machines? Are humans less tolerant of errors made by machines than they are of those made by humans? These are questions that warrant continued thought and broad-ranging debate.
Attend an eye health meeting nowadays and you are highly likely to encounter a large number of presentations and posters describing AI systems for a wide range of eye health applications. Performance metrics for these systems are typically impressive – high sensitivity and specificity are more the norm than the exception. Does this mean that these AI systems will be ubiquitous in the clinic in the next few years? Not likely.
The translation gap describes the substantial difference between developing a research-grade AI system and a clinically validated medical device, which is how these applications are regarded by regulatory agencies. Presently, in the field of AI the translation gap is more aptly described as a gulf. Stringent regulatory standards apply to the manner in which the code is written and the manner in which systems are trained and tested. Data security and privacy are also major considerations. While AI research abounds, there is a major shortage of evidence about the performance of AI systems in real world clinical practice. These barriers mean that translating an AI system from the coding lab to the clinic is a time and cost-intensive process. This has major implications as it means that well-resourced technology companies are at a significant competitive advantage over smaller players.
Detractors of AI often point to the ‘narrowness’ of existing systems. Much of the focus of AI system development in ophthalmology has been on image analysis, but major advances are also being made in other domains, including voice recognition and language processing. When these wide ranging technologies are brought together in healthcare, we are likely to see major capability advances and the potential for significant disruption. An evolving field of AI research is in the area of ‘artificial empathy’ (AE), developing systems that have the capacity to detect and respond to human emotion. The application of AE in healthcare may also have significant disruptive potential.
Medical professionals working in a time of major technological change need to try to keep up to date with developments in the field of AI, but doing so is no easy task. Navigating the ever more crowded digital health space is difficult for consumers and clinicians alike. The National Health Service in the United Kingdom has taken meaningful steps forward by curating a library of digital health apps that have been approved and subject to stringent assessments including safety, efficacy, data security, usability, accessibility and stability. Similar independent evaluations will be invaluable for clinicians as AI-powered health applications become more common.
Associate Professor Peter van Wijngaarden is a Principal Investigator at CERA and a consultant ophthalmologist in the medical retina clinic at the Royal Victorian Eye and Ear Hospital. He has been a Deputy Director at CERA since 2017.
He is a Clinical Director and a founding member of KeepSight – a Commonwealth Governmentfunded national approach to diabetic retinopathy screening in Australia. A/Prof van Wijngaarden is also a member of the medical and research committees of the Macular Disease Foundation Australia. He is a member of the Royal Australian and New Zealand College of Ophthalmologists’ Future of Ophthalmology Taskforce and a representative on the Vision 2020 Vision Initiative, a public health program seeking to reduce avoidable blindness in Victoria. He is a board member of the Ophthalmic Research Institute of Australia.
- Mukherjee S, A.I. Versus M.D. What happens when diagnosis is automated? Annals of Medicine. 3 April, 2017. www.newyorker.com/magazine/2017/04/03/ai-versus-md
- Etherington D. Why Vinod Khosla thinks radiologists still practicing in 10 years will be ‘causing deaths’. Techcrunch. com/2019/06/12/why-vinod-khosla-thinks-radiologists-stillpracticing- in-10-years-will-be-causing-deaths/
- Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018 Aug 28;1:39.
- De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O’Donoghue B, Visentin D, van den Driessche G, Lakshminarayanan B, Meyer C, Mackinder F, Bouton S, Ayoub K, Chopra R, King D, Karthikesalingam A, Hughes CO, Raine R, Hughes J, Sim DA, Egan C, Tufail A, Montgomery H, Hassabis D, Rees G, Back T, Khaw PT, Suleyman M, Cornebise J, Keane PA, Ronneberger O. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine. 2018 Sep;24(9):1342-1350. doi: 10.1038/s41591-018-0107- 6. Epub 2018 Aug 13.
- Schmidt-Erfurth U, Sadeghipour A, Gerendas BS, Waldstein SM, Bogunović H. Artificial intelligence in retina. Progress in Retinal and Eye Research. 2018 Nov;67:1-29. doi: 10.1016/j.preteyeres.2018.07.004. Epub 2018 Aug 1.
- Waymel Q, Badr S, Demondion X, Cotten A, Jacques T. Impact of the rise of artificial intelligence in radiology: What do radiologists think? Diagn Interv Imaging. 2019 Jun;100(6):327-336.
- Sayres R, Taly A, Rahimy E, Blumer K, Coz D, Hammel N, Krause J, Narayanaswamy A1, Rastegar Z, Wu D, Xu S, Barb S, Joseph A, Shumski M, Smith J, Sood AB, Corrado GS, Peng L, Webstr DR. Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy. Ophthalmology. 2019 Apr;126(4):552- 564. doi: 10.1016/j.ophtha.2018.11.016. Epub 2018 Dec 13.
- D Betts & L Korenda. Inside the patient journey: Three key touch points for consumer engagement strategies. Findings from the Deloitte 2018 Health Care Consumer Survey. www2.deloitte.com
- Keel S, Lee PY, Scheetz J, Li Z, Kotowicz MA, MacIsaac RJ, He M. Feasibility and patient acceptability of a novel artificial intelligence-based screening model for diabetic retinopathy at endocrinology outpatient services: a pilot study. Sci Rep. 2018 Mar 12;8(1):4330
- Awad E, Dsouza S, Kim R, Schulz J, Henrich J, Shariff A, Bonnefon JF, Rahwan I. The Moral Machine experiment. Nature. 2018 Nov;563(7729):59-64.doi: 10.1038/s41586- 018-0637-6. Epub 2018 Oct 24.