Share this Article
Copyright: © Singapore Medical Association
Colonoscopy is the reference standard procedure for the prevention and diagnosis of colorectal cancer, which is a leading cause of cancer-related deaths in Singapore. Artificial intelligence systems are automated, objective and reproducible. Artificial intelligence-assisted colonoscopy has recently been introduced into clinical practice as a clinical decision support tool. This review article provides a summary of the current published data and discusses ongoing research and current clinical applications of artificial intelligence-assisted colonoscopy.
Colorectal cancer (CRC) is the most frequently occurring cancer and a leading cause of cancer-related mortality in Singapore.(1) Colonoscopy is the reference standard procedure for the prevention and diagnosis of CRC and has been shown to reduce CRC-related mortality.(2,3) The role of colonoscopy in the prevention of CRC lies in the accurate detection and adequate resection of colorectal adenomas that are considered premalignant and may progress to CRC. A 1% increase in adenoma detection rate (ADR) has been shown to reduce interval CRC by 3%.(4) The adequacy of resection largely depends on the appropriate choice of technique and endoscopic accessories, which, in turn, is dependent on the size, morphology, predicted histology and, in cases of early cancer, the predicted depth of invasion.(5,6) The same factors and adequacy of resection determine the risk of recurrence of colorectal adenomas and influence the timing of surveillance colonoscopy.(7,8)
Artificial intelligence (AI) comprises several different fields. Machine learning and, more specifically, deep learning, in which hierarchical representation learning is performed across multiple layers of artificial neural networks,(9) is the most extensively studied application of AI in medicine.(10) Most research on AI in gastroenterology has centred on addressing the unmet needs in colonoscopy, with the goal of reducing CRC-related morbidity and mortality.(11,12) Position statements and recommendations regarding AI in endoscopy have been published, and they serve to facilitate and regulate the proliferation of research on the use of AI in endoscopy practice.(13-15)
The use of AI in colonoscopy in now a clinical reality, as AI systems have received regulatory approval and are now commercially available. Robust data in the context of polyp and adenoma detection has been published, but research on its utility in disease differentiation and quality assurance is ongoing. This narrative review article aims to provide a summary of current published data and clinical applications of AI-assisted colonoscopy. It focuses on areas of unmet needs that are of clinical significance and highlights the potential role of AI in filling the current gaps in colonoscopy.
A comprehensive literature search was performed in the PubMed, Web of Science, MEDLINE and EMBASE electronic databases from the inception of the databases up to and including 5 December 2021. The key search terms used were ‘artificial intelligence’ OR ‘deep learning’ OR ‘computer aided detection’ OR ‘computer aided diagnosis’ AND ‘colonoscopy’. Electronic searches were supplemented with manual searches of references of all retrieved studies to identify other relevant publications, and only studies published in English were considered for this narrative review article.
COMPUTER-AIDED DETECTION FOR POLYP DETECTION IN COLONOSCOPY
CRCs detected after a prior colonoscopy and in the interval between scheduled surveillance colonoscopies are known as interval cancers or post-colonoscopy CRCs (PCCRCs). PCCRCs may be due to the biological behaviour of the CRC, or missed or incompletely resected adenomas. It is estimated that the incidence of interval CRC is as high as 3.5 per 1,000 screened persons.(16) An earlier study found that 52% of PCCRCs were attributable to probable missed lesions, while 19% of PCCRCs in the study may possibly be related to incompletely resected lesions.(17) In a more recent study conducted in a national colonoscopy training centre, where World Endoscopy Organization (WEO) methodology was used to determine and categorise PCCRCs, it was deemed that 85% of CRC cases after a negative colonoscopy were due to possible missed lesions according to the WEO criteria.(18) Zhao et al conducted a meta-analysis of more than 15,000 tandem colonoscopies and found that the adenoma miss rate (AMR) was 26% (95% confidence interval [CI] 23%–30%).(19)
Although AMR may be influenced by several factors, an important determinant of AMR is the endoscopist,(20,21) as an inability to maintain a sustained level of alertness during colonoscopy owing to distraction and fatigue may result in a polyp being missed despite it being visible on the monitor. The inclusion of nurses and endoscopy trainees during colonoscopy has been shown to increase ADR, as they act as an independent ‘second reader’ to the endoscopist.(22,23) Computer-aided detection (CADe) functions as an automated second reader, but without the inherent problems of distraction and fatigue that may affect the performance of the endoscopist and the human second reader.
In a meta-analysis(24) consisting of five randomised controlled trials (RCTs)(25-29) with 4,354 patients, the pooled ADR was significantly higher in the CADe group than in the control group (36.6% vs 25.2%, relative risk [RR] 1.44, 95% CI 1.27–1.62; p < 0.01), with both groups using high-definition white light endoscopes. There was no significant difference in the withdrawal times between the groups in the individual trials, which eliminated the possibility of withdrawal time acting as a confounder for the efficacy of the CADe systems in increasing ADR. However, it must be noted that the increase in ADR with CADe was almost entirely due to the increased detection of diminutive (< 5 mm) adenomas, with only one study(27) showing an improvement in detection of small (6–9 mm) adenomas (17.2% vs. 12.7%; p < 0.05) and none of the studies showing a difference in the detection of advanced adenomas > 10 mm. Up to 90% of polyps detected during colonoscopy are diminutive and small in size, and the rates of progression of these lesions to CRC are thought to be low.(30) A recently conducted large-scale, propensity score-matched, single-centre prospective study on 1,836 patients in Japan(31) similarly showed an increase in ADR with CADe compared to controls (26.4% vs. 19.9%). However, most of this increase in ADR was for diminutive polyps, with no significant increase observed in detection of advanced neoplasia (3.7% vs. 2.9%, respectively). This data is consistent with prior publications related to polyp size and miss rate. A systematic review by van Rijn et al examined miss rates based on the size of the polyp.(32) The AMR for adenomas > 10 mm, 5–10 mm and 1–5 mm was 2.1% (95% CI 0.3%–7.3%), 13% (95% CI 8.0%–18%) and 26% (95% CI 27%–35%), respectively. Although small polyps are usually benign and do not progress, about 6% of such polyps have been shown to progress to advanced adenomas over time.(30) Small adenomas have also been shown to harbour foci of high-grade dysplasia or intramucosal cancer.(33) The recommended surveillance interval following colonoscopy and endoscopic resection of colonic adenomas is 1–10 years, based on the numbers of adenomas, as well as the histology.(7) Hence, small adenomas cannot be simply dismissed and are of potential clinical relevance. Apart from size, a flat or mildly elevated polyp morphology, such as in the context of serrated polyps, has also been implicated as a factor for missed lesions.(19) CADe has been demonstrated to increase the detection of adenomas of both flat (RR 1.78, 95% CI 1.47–2.15) and polypoid morphology (RR 1.54, 95% CI 1.40–1.68).(24)
Livovsky et al(34) studied the Detection of Elusive Polyps (DEEP2) polyp detection system, specifically looking at the performance of CADe in detecting elusive polyps, which were divided into fleeting and subtle polyps in the study. Fleeting polyps were defined as polyps appearing in the field of view (FOV) for ≤ 5 seconds, while subtle polyps were those missed initially by the endoscopist and offline independent gastroenterologists who were reviewing recorded videos of the colonoscopy for annotation purposes. The study found that when polyps appeared in the FOV for < 5 seconds, the sensitivity of DEEP2 for detection was 88.5% (95% CI 84.6%–92.4%), compared to 31.7% (95% CI 26.0%–37.5%) for the endoscopist (p < 0.01). The difference in sensitivity was 84.9% (95% CI 79.3%–90.5%) versus 18.9% (95% CI 12.8%–24.9%), respectively, when the FOV was adjusted to < 2 seconds. The study also showed that DEEP2 was able to detect an average of 0.22 subtle polyps per sequence.
Tandem colonoscopy studies on CADe shed some light into its impact on AMR. In a prospective tandem colonoscopy study conducted by Wang et al,(35) patients were randomly assigned to colonoscopy with or without CADe, which was immediately followed by the other procedure. The AMR was significantly lower in the group that underwent colonoscopy with CADe compared with routine colonoscopy as the first procedure (13.9% vs. 40.0%; p < 0.0001). This result was consistent regardless of the segment of the colon examined. The CADeT-CS Trial(36) was a prospective, multicentre, single-blind, randomised tandem colonoscopy study conducted in a United States population, which also showed a decrease in AMR in the CADe-first group compared to the high-definition white light group (20.12% vs. 31.25%; odds ratio [OR] 1.8048, 95% CI 1.0780–3.0217; p = 0.0247). In another study by Lui et al,(37) the colon was divided into different segments for withdrawal and the endoscopist was blinded to the CADe output, which was displayed on a separate monitor. Unblinding of AI results for the endoscopist was provided by an independent viewer after each colonic segment was examined. Using this method, the total number of polyps and adenomas detected increased by 32.1% and 23.6%, respectively. An example of CADe in real-time colonoscopy is shown in
Colonoscopy image shows the use of computer-aided detection, in which the polyp is highlighted in a bounding box displayed in real time on the endoscopy monitor.
Despite the apparent advantages of CADe in increasing polyp detection and reducing AMR, AI systems also suffer from the same limitations as expert endoscopists. Polyps that are not visible on the endoscopy monitor will be ‘invisible’ to polyp detection systems. This could be due to being hidden behind mucosal folds or concealed by poor bowel preparation. In a study that evaluated false detections with CADe in colonoscopy,(38) blurry images were found to result in distorted polyp texture and were one of the reasons for false negative detections. These false negative detections also occurred when polyps approached the corner of a frame just before appearing or disappearing from the FOV. In addition, false positive detections increase the amount of visual distraction experienced by the endoscopist. The impact of this is not fully known, as the majority of early studies on CADe do not explicitly report or analyse false positive detections. There has been an effort to define the false positive duration of a frame before it should be considered a false positive detection,(39) but this has not been adequately studied and no consensus has been reached on an acceptable definition.
COMPUTER-AIDED DIAGNOSIS FOR POLYP CHARACTERISATION DURING COLONOSCOPY
Various polyp classification systems have been developed for in vivo prediction of polyp histology before resection and formal histological analysis. Also known as an optical biopsy, these systems exploit the different appearances of polyp surfaces and vessels under narrow-band wavelengths of light(40) or when stained with dyes(41) to determine their neoplastic potential and estimated depth of invasion, and are collectively known as image-enhanced endoscopy (IEE). Narrow-band imaging (NBI) and blue laser imaging (BLI) are the most extensively studied examples of IEE utilising narrowed wavelengths of white light. As discussed in the introduction, the predicted histology of colorectal polyps aids the endoscopist in selecting the optimal method of resection. The optical prediction of polyp histology is also a crucial element of the ‘resect and discard’(42,43) and ‘detect and leave’(44) strategies, which can make endoscopic examinations and treatment of diminutive colorectal polyps more cost-effective, provided these satisfy the criteria of the American Society for Gastrointestinal Endoscopy (ASGE) Preservation and Incorporation of Valuable endoscopic Innovations (PIVI) recommendations. This requires that in the context of suspected rectosigmoid hyperplastic polyps that are 5 mm or smaller, the technology should provide a negative predictive value (NPV) greater than 90%, when used with high confidence, for adenomatous histology.(45) Examples of IEE classification include the Kudo pit pattern,(41) Sano,(46) NBI International Colorectal Endoscopic,(47) Japan NBI Expert Team(48) and BLI Adenoma Serrated International Classification(49) systems.
The use of IEE in clinical practice is dependent on the availability of equipment, the experience of the endoscopist and access to structured training. While the latter has been shown to improve the accuracy of optical biopsy with IEE,(50) there is still a wide gap in accessibility to proper equipment and training depending on the resources available. This has resulted in moderate interobserver variability(51,52) and modest results(53) where accuracy of IEE for prediction of polyp histology has been studied in clinical practice.
Most of the early studies evaluating computed-aided diagnosis (CADx) were retrospective in nature and tested deep learning models on still images or video recordings of polyps.(54-58) In contrast, few prospective studies on CADx in real-time colonoscopy are currently available. In a study comparing magnifying NBI and a CADx support vector machine in 118 colorectal lesions, Kominami et al(59) were able to demonstrate a concordance between the endoscopic diagnosis and CADx output of 97.5%. The accuracy, sensitivity, specificity, positive predictive value (PPV) and NPV of the CADx system’s output were 94.9%, 95.9%, 93.3%, 95.9% and 93.3%, respectively. Song et al(60) used a deep learning model to classify near-focus NBI images of polyps in real-time. The study showed that the CADx system was able to classify polyps as serrated polyps, benign adenomas/mucosal/superficial submucosal cancer and deep submucosal cancer, with areas under the receiver operating characteristic curve (AUROC) of 0.93–0.95, 0.86–0.89 and 0.89–0.91, respectively. CADx had an overall diagnostic accuracy of 81.3%–82.4%, which outperformed trainee endoscopists and was comparable with that of expert endoscopists. When CADx was used to assist trainee endoscopists, an increase in agreement was observed between true and predicted polyp histology (kappa improved from 0.368 to 0.655), while the diagnostic accuracy increased from 63.8%–71.8% to 82.7%–84.2%. A meta-analysis(61) of 18 studies (three prospective, 15 retrospective, total of 7,680 polyp images) on prediction of polyp histology using CADx models showed a pooled sensitivity, specificity and AUROC of 92.3% (95% CI 88.8%–94.9%), 89.8% (95% CI 85.3%–93.0%) and 0.96 (95% CI 0.95–0.98), respectively. Six of the included studies compared the performance of CADx with non-expert endoscopists and showed that CADx was significantly better than non-expert endoscopists in the accurate prediction of polyp histology (AUC 0.97 vs 0.90, respectively; p < 0.01).
(a) Colonoscopy image shows the use computer-aided diagnosis to predict a neoplastic polyp. (b) Photomicrograph shows a tubular adenoma with low-grade dysplasia (Haematoxylin & eosin, original magnification × 40).
CADx has also been studied with endocytoscopy, which utilises specialised contact light microscopy colonoscopes with 520× optical zoom capability and mucosal staining techniques to visualise cellular structures. A single-centre, open-label, prospective study(62) showed that endocytoscopy with CADx had an NPV for diminutive rectosigmoid adenomas of 93.7%–96.4% with methylene blue staining and 95.2%–96.5% with NBI, which satisfied the ‘detect and leave’ threshold of 90% recommended by the ASGE PIVI.(45) Rodriguez-Diaz et al(63) used a CADx model that simultaneously displayed elements informing polyp histology assessment in each frame on the endoscopy monitor. The model output was a detailed spatial histology heat map using varying shades of red, green and yellow to represent high-confidence neoplastic, high-confidence non-neoplastic and low-confidence assessments, respectively. This augmented visualisation of the polyp in real-time enabled the endoscopist to assess the prediction made and use the spatial information to guide decisions on management of the polyp in question. The CADx model was tested on 254 polyps, with a sensitivity, specificity and NPV of 96%, 84% and 91%, respectively, in distinguishing neoplastic from non-neoplastic polyps of all sizes.
AI-ASSISTED QUALITY ASSURANCE IN COLONOSCOPY
Owing to its unique role in the prevention of CRC, strict quality indices have been recommended to ensure that all screening colonoscopies are of high quality.(2,3) These quality indices include not only the ADR of the endoscopist but also the withdrawal time,(64) caecal intubation rate and adequacy of bowel preparation.(65,66) Despite being extensively studied, quality indices in performance and reporting of colonoscopy may not be adhered to because of lack of real-time feedback, training and enforcement.(67,68) For example, most endoscopists are unaware of their individual ADR. This could be due to a variety of factors including the manual nature of collecting and combining information from colonoscopy and histology reports, which are recorded on separate electronic systems or in hard copy in virtually all centres, as well as lack of regular and structured feedback on the individual endoscopist’s ADR. A recent study conducted in Japan(69) showed that a group meeting and individual interview with the director to communicate endoscopist performance in ADR could increase the mean ADR significantly from 40.8% to 50.8%. This example highlights how feedback on the individual quality indices in colonoscopy could improve the endoscopist’s performance in colonoscopy.
Gong et al(70) conducted an RCT of 704 patients using the ENDOANGEL system, which provided automated monitoring of withdrawal time and speed, and adequacy of mucosal exposure. The information was relayed to the endoscopist in real time and resulted in a significantly longer mean withdrawal time in the ENDOANGEL group compared to the control group (6.38 minutes vs. 4.76 minutes, respectively; p < 0.001). The ADR in the ENDOANGEL group was significantly higher (16% vs. 8%), and this is currently the only RCT to show an increased detection rate in adenomas > 10 mm in size (10/355 vs. 1/349, respectively; OR 9.50, 95% CI 1.19–75.75; p = 0.034). The use of an automatic quality control system (AQCS) was also shown to improve mean withdrawal times from 5.68 minutes to 7.03 minutes (p < 0.001) in a study by Su et al.(28) The AQCS generated audio prompts for the endoscopist to slow down the speed of withdrawal when unstable or blurry frames were displayed, or when a colonic segment had a suboptimal bowel preparation with Boston Bowel Preparation Scale (BBPS) < 2. Using this automated system, the rate of bowel preparation was also increased (87.3% vs. 80.6%, p = 0.023). The AQCS was combined with CADe in the study and showed an increase in ADR (28.9% vs. 16.5%, p < 0.001). However, it should be noted that in both these studies, the control groups did not meet the minimum standard for withdrawal time and ADR (
Examples of quality indicators in colonoscopy.
The use of deep learning for automatic calculation of the BBPS in colonoscopy was also examined in a prospective observational study.(71,72) The automatic BBPS (e-BBPS) system was based on existing definitions of adequacy of bowel preparation spelled out in the BBPS. It was prospectively validated in 616 patients undergoing screening colonoscopy and showed a significant inverse correlation between the e-BBBPS score and ADR (Spearman’s rank correlation −0.976; p < 0.01). Based on the results of the study, a threshold e-BBPS score of 3 was calculated to guarantee an ADR of more than 25%. Using this threshold, patients with an e-BBPS score of ≤ 3 had a significantly higher ADR than patients with e-BBPS > 3 (28.03% vs. 15.93%; p < 0.001). The study showed that a validated AI system based on deep learning can supplement the endoscopist with objective and precise information about bowel preparation that is reproducible and more refined than current visual estimations of bowel preparation by the endoscopist.
Yao et al examined the role of combined CADe and computer-aided quality improvement system (CAQ) during colonoscopy in improving ADR.(73) Patients undergoing colonoscopy were randomised to four groups: (control: 271, CADe: 268, CAQ: 269 and CADe plus CAQ [COMBO]: 268). The primary outcome was ADR. The average ADR in the control, CADe, CAQ and COMBO groups was 14.76% (95% CI 10.54%–18.98%), 21.27% (95% CI 16.37%–26.17%), 24.54% (95% CI 19.39%–29.68%) and 30.6% (95% CI 25.08%–36.11%), respectively. The ADR was significantly higher in the COMBO group than in the CADe group (30.60% vs. 21.27%; p = 0.024), but the difference was not significantly different when compared to the CAQ group (30.60% vs. 24.54%; p = 0.213).
To overcome the inefficient and time-consuming nature of manual data retrieval and tracking of patients for post-polypectomy colonoscopy surveillance, a pipeline utilising natural language processing (NLP) techniques was developed to automatically extract and analyse information from free-text colonoscopy and pathology reports.(74) The pipeline consisted of three modules. The first module was for polyp property extraction, where rule-based methods and statistical classifiers were used to extract relevant information about polyps from colonoscopy and histology reports. In the polyp grouping module, extracted polyp properties such as morphology, location and size were associated with their unique polyp mentions. Lastly, the surveillance interval classification module integrated the information to classify patients into one of six risk categories based on the recommended post-colonoscopy surveillance intervals by the United States Multi-Society Task Force on CRC.(7) The pipeline was evaluated on an independent test set of 200 reports (100 each from colonoscopy and histology) and achieved an overall accuracy of 92% in assigning the recommended interval for surveillance colonoscopy. The study showed that NLP techniques can be used in colonoscopy to develop a pipeline for automated assignment of surveillance intervals, which has traditionally been a very tedious and inefficient process.
OTHER EXAMPLES OF RESEARCH IN THE USE OF AI IN COLONOSCOPY
The same principles of objectivity and reproducibility with automated systems discussed in this review article can also be applied to other areas of colonoscopy. For example, colonoscopy is an essential tool in the diagnosis and assessment of severity of inflammatory bowel disease (IBD). However, endoscopic assessment has inherent subjectivity, which is further compounded by differences in training, volume of cases seen and the level of expertise of the individual endoscopist. In this setting, AI systems have been studied to more reliably diagnose and assess disease severity in IBD.(75-77)
Similarly, the estimation of CRC depth and risk of lymph node metastasis is crucial in determining whether endoscopic resection techniques such as endoscopic submucosal dissection and endoscopic mucosal resection are appropriate to achieve curative resection, which has lower associated morbidity and mortality compared to surgical resection.(78-81) Accurate prediction of depth of CRC invasion in real time, as well as determination of risk of lymph node metastasis in T1 CRC, can be very challenging for clinicians.(82-84) It is, therefore, unsurprising that AI is also being studied in these areas.(85-87)
INTEGRATING AI-ASSISTED COLONOSCOPY INTO CURRENT CLINICAL PRACTICE
Current commercially available AI systems in Singapore either have the CADe function alone or CADe combined with CADx functions. The current CADx function differentiates between hyperplastic and neoplastic polyps. There is no further stratification in terms of severity of dysplasia or depth of invasion in the context of cancer. Therefore, it would appear that the current key utility of AI-assisted colonoscopy is its role in increasing ADR, thereby facilitating diagnosis and endoscopic resection of adenomas that may otherwise be missed. This is especially useful for less experienced endoscopists performing screening colonoscopy. Admittedly, there is no direct data on the longer-term impact of CADe in terms of reduction in CRC incidence and CRC-related mortality, but these are reasonable assumptions to extrapolate from past data related to screening colonoscopy.(4) ADR is a function of different components during the process of colonoscopy, and it is important and illustrative to dissect these individual components in order to better appreciate how to improve ADR using different approaches. Excellent mucosa surface visibility is extremely important for adenoma detection, and it depends on the adequacy of bowel preparation, and irrigation and suctioning during the process of colonoscopy. Another crucial aspect is meticulous examination of the entire visible colonic mucosa (careful slow withdrawal of the endoscope, adequate air insufflation, pressing down and looking behind folds, and re-examination of flexures). The third vital element is being aware that premalignant lesions such as sessile serrated adenomas may be easily missed owing to their subtle endoscopic features, and being trained to recognise such subtle features and detect these flat subtle lesions on the exposed mucosa surface. Different endoscopic tools are available to improve adenoma detection, focusing on different aspects of the examination process during colonoscopy. These include devices attached to the colonoscopy tip to flatten mucosal folds in order to expose adenomas hidden by folds; contrast dye-based IEE techniques such as indigo carmine chromoendoscopy, which accentuates mucosal surface contours to highlight flat lesions; electronic IEE techniques such as NBI and BLI that accentuate mucosal surface details to facilitate recognition of subtle mucosal surface abnormalities to improve detection, and with additional magnification, also allow characterisation and diagnosis; and endoscopy systems such as full-spectrum endoscopy (FUSE) that increase the extent of the endoscopic view.(88) CADe draws the endoscopist’s attention to the presence of a polyp when it appears in the endoscopic view. However, it will not be able to detect polyps in unexposed areas, such as when they are hidden behind mucosal folds or obscured owing to suboptimal bowel preparation, and when the colonoscope withdrawal speed is too fast.
A recent network meta-analysis of 50 RCTs comprising 34,445 participants compared CADe with high-definition white light endoscopy, IEE techniques, and techniques that increased mucosal visualisation such as distal attachments and FUSE.(89) CADe was ranked as the superior technique for adenoma detection. Cross-comparisons of CADe with other imaging techniques showed a significant increase in the ADR with CADe versus increased mucosal visualisation systems (OR 1.54 [95% CI 1.22–1.94]; low certainty of evidence) and with CADe versus chromoendoscopy (OR 1.45 [95% CI 1.14–1.85]; moderate certainty of evidence). CADe also seemed to be the superior strategy for detection of sessile serrated lesions (with moderate confidence in hierarchical ranking), although no significant increase in the sessile serrated lesion detection rate was observed (OR 1.37 [95% CI 0.65–2.88]).
The role of AI-assisted colonoscopy is expanding, with data and clinical applications emerging most rapidly in the fields of polyp detection, prediction of polyp histology and automated quality assurance. The objectivity and reproducibility afforded by these automated systems will see a further expansion of data from AI-assisted colonoscopy in other areas of colonoscopy and newer clinical applications. We should harness advances in technology to improve our practice. However, we should also keep in mind that technology complements but does not replace the fundamentals of quality colonoscopy.