evidence based AI

Evidence-base for novel clinical AI applications review - July 2022 edition

In this issue of the series we summarize several recent meta-reviews on computer vision software applied for the imaging of prostate, breast and gastroesophageal cancer. The selection of the articles for the review is based on their clinical significance and citation indexes of the journals.

1. Prostate cancer MRI 

In a recent literature review N. Sushentsev et al. compared two methods of prostate MRI analysis based on fully-automated deep learning and semi-automated traditional machine learning. Despite the comparable effectiveness of the two classes of AI methods for differentiating clinically significant prostate cancer (csPCa), indolent PCa (iPCa) and benign conditions, several common methodological limitations and biases were identified.

As is well known, the introduction of multiparametric magnetic resonance imaging (mpMRI) prior to biopsy has greatly improved the quality of prostate cancer (PC) diagnosis, reducing the number of unnecessary biopsies and increasing the detection of clinically significant disease. The high dependence of MRI efficiency on radiologist’s experience and image quality limits the usage of MRI. Therefore, European Society of Urogenital Radiology (ESUR) and the Urological Imaging Section of the European Association of Urology (EAU) emphasize the importance of reliable and clinically applicable artificial intelligence (AI) techniques to overcome the aforementioned limitations and facilitate the successful implementation of the mpMRI-based PCa diagnostic pathway

Critical review by Sushentsev et al. came to some unexpected results. 

First, the overwhelming majority of papers included in this review either utilised non-publicly available single-center datasets or used the open-source PROSTATEx challenge dataset. As the authors rightly observed, using one public dataset without additional data encourages overfitting across the community, which limits the usefulness of the dataset itself. 

In half of the studies images were segmented just by a single radiologist, which certainly limits the generalizability of the predictive models

Only 80% of deep-learning papers and 67% of traditional machine-learning papers used MRI-derived biopsy specimens as the source of ground truth. Although radical prostatectomy specimens allow definitive assessment of lesion morphology, the resulting prognostic models will have very limited clinical application because of the overrepresentation of patients with intermediate-risk disease.

None of deep-learning papers used external testing to assess developed predictive models. Even if external testing becomes the norm, it is also important to avoid common errors in reporting.


This review is really critical. Current prostate imaging AI is limited by the average quality of the datasets used for training and validation. Multi-center datasets, annotation of datasets by multiple radiologists and external testing are pivotal for improving the quality of AI algorithms.

2. Prostate cancer PET/MRI

A systematic review by Liberini et al. aims to describe the basic concepts and the current literature on AI and radiomics applied to molecular imaging of prostate cancer. This review includes the relatively large number (37) of sources analyzed according to PRISMA criteria.

The largest part of the review is devoted to the assessment of the possibilities of radiomics in the staging and restaging of prostate cancer, taking into account the use of different tracers. The section on radiomics bone scintigraphy in the context of the gradual abandonment of this method of examination in favor of whole-body MRI looks rather peculiar. 

What makes this article unique is the overview of reconstruction algorithms. Among different applications, the implementation of AI algorithms in PET/magnetic resonance imaging (MRI) reconstruction seems particularly interesting in the development of hybrid imaging in prostate cancer patients. 

As is well-known, attenuation correction in PET/MRI is more challenging than in PET/CT, as voxel intensity of MRI cannot reflect photon attenuation characteristics directly. The reviewers mention several machine learning methods for improving MRAC methods based on segmentation using AI. Various deep learning approaches have been developed to improve the MRAC method based on pelvic segmentation using ultrashort echo time (UTE) or Dixon volumetric interpolated breath-hold MRI sequences, as well as generative adversarial networks.

Another extremely interesting application is the automatic tumor segmentation, with very useful implications in clinical practice; automatic segmentation primary lesion can be implemented by fusion biopsy systems that use ultrasound, MRI, and PET simultaneously to more accurately identify the target site for biopsy.


This review may be useful for the basic information about radiomics in radionuclide diagnostics and theranostics. However, there is a lack of information about possible methodological errors in the reviewed studies, which may reduce the confidence of the results.

3. Breast cancer MMG

There are currently more than five FDA-approved algorithms for interpreting mammograms. Computer Aided Discovery (CAD) machine learning algorithms can achieve and even exceed physician performance on standard-looking two-dimensional mammograms (i.e., mediobasal oblique and craniocaudal). The review by Hickman et al. investigated whether machine learning algorithms are as sensitive and specific as radiologists in detecting breast cancer in patients undergoing screening mammography. In addition, authors evaluated the impact of machine learning algorithms if implemented in clinical practice.

The authors found that the performance of mammography screening algorithms achieves equivalence with readers in separate computerized detection and computer-assisted diagnosis tasks. The combined specificity of the algorithms (90.6%) outperformed that of the combined readers (88.6%) and the single reader in the United States (88.9%). Thus, further improvements are needed to ensure that machine learning systems meet "clinically relevant thresholds" of current reader performance and screening program targets. 

The algorithms also perform tasks not performed by readers, such as sorting through a large number of normal cases, which is critical in the context of a limited number of mammologists. 

As the authors note, the main challenge is to generate a notion of the number of "errors" the system can tolerate, as well as finding a "clinically meaningful threshold" for AI performance.

No prospective studies have yet been reported; many studies are still conducted with retrospective internal testing. In addition, most studies have used enriched cancer cohorts for testing, which do not account for the normal distribution of patients at screening. Thus, these datasets may not provide a realistic representation, which limits generalizability, clinical applicability, and workflow translation. 


Machine learning performance is comparable to that of a certified mammologist. Machine learning can perform triage tasks at a volume and speed not available to a mammologist. However, further prospective data is needed to understand where algorithm thresholds are set, which will allow us to examine the impact of AI algorithm on patient outcomes over time.

4. Gastrointestinal cancer imaging 

From the problems of radiomics in prostate and breast cancer we can move on to GI imaging. A systematic review by Chidambaram et al reported on the use of artificial intelligence in the diagnosis and Postoperative Surveillance of Upper Gastrointestinal Malignancies Using Computed Tomography Imaging. 

Upper gastrointestinal cancer is an aggressive malignancy with a poor prognosis, even after multimodality combined therapy. Unlike colorectal, hepatocellular, and pancreatic cancers, there is no reliable noninvasive biomarker to track esophageal and gastric cancers. Therefore, esophageal and gastric cancer requires accurate diagnosis to determine treatment tactics and prognostic groups. 36 studies describing the use of AI were included in the qualitative synthesis, and 6 studies involving 1,352 patients were included in the quantitative analysis. Studies were evaluated for methodological reliability using the Quality Assessment Assessment Tool for Diagnostic Accuracy Studies 2 (QUADAS-2), which includes patient selection, index test, reference standard, and flow of patients through the study.

The authors of the review concluded that it is possible to use AI methods for differential diagnosis of malignant diseases and detection of occult tumor process. It should be understood that usually a patient with a tumor of the esophagus or stomach comes for a CT scan for diagnosis and staging purposes. Whereas AI will allow predicting the response to chemoradiotherapy and forming groups of patients according to the risk of recurrence. There is a direct correlation between histopathological response of patients who have undergone chemoradiotherapy and overall survival. Therefore, the ability to assess clinical response will be useful in adjusting the dose of chemoradiotherapy.

There are a number of weaknesses in the review described. For example, most of the evaluated articles did not provide sensitivity and specificity values. Different data collection protocols were used, which increased the heterogeneity of the data. Patient samples were predominantly from the Asian region. Studies on small (less than 100 patients) samples do not allow the collection of reliable statistics on the type of response to treatment.

The authors provide ways to solve the identified problems. Systems created upon small single-center datasets with post hoc labeling rarely perform well. The independent validation of artificial intelligence systems is important. An assessment of the economic feasibility of AI is also required. It is likely to be more cost-effective to hire more doctors than to purchase expensive AI.


There is a need for a single extensive multicenter study of the AI capabilities in the diagnosis of upper gastrointestinal cancer. The best methodology for data collection and analysis for standardization between centers must be chosen.


The evaluated meta-analyses provide strong evidence of the capabilities of AI in cancer diagnostics. The effectiveness of AI is comparable to that of radiologists, justifying its applications in leading hospitals. For the healthcare transformation to happen multi-center clinical trials on heterogeneous patient cohorts are still needed.

List of publications:

  1. Sushentsev N, Moreira Da Silva N, Yeung M, Barrett T, Sala E, Roberts M, Rundo L. Comparative performance of fully-automated and semi-automated artificial intelligence methods for the detection of clinically significant prostate cancer on MRI: a systematic review. Insights Imaging. 2022 Mar 28;13(1):59. doi: 10.1186/s13244-022-01199-3. PMID: 35347462; PMCID: PMC8960511
  2. Liberini V, Laudicella R, Balma M, Nicolotti DG, Buschiazzo A, Grimaldi S, Lorenzon L, Bianchi A, Peano S, Bartolotta TV, Farsad M, Baldari S, Burger IA, Huellner MW, Papaleo A, Deandreis D. Radiomics and artificial intelligence in prostate cancer: new tools for molecular hybrid imaging and theragnostics. Eur Radiol Exp. 2022 Jun 15;6(1):27. doi: 10.1186/s41747-022-00282-0. PMID: 35701671; PMCID: PMC9198151
  3. Hickman SE, Woitek R, Le EPV, Im YR, Mouritsen Luxhøj C, Aviles-Rivero AI, Baxter GC, MacKay JW, Gilbert FJ. Machine Learning for Workflow Applications in Screening Mammography: Systematic Review and Meta-Analysis. Radiology. 2022 Jan;302(1):88-104. doi: 10.1148/radiol.2021210391. Epub 2021 Oct 19. PMID: 34665034; PMCID: PMC8717814. 
  4. Chidambaram S, Sounderajah V, Maynard N, Markar SR. Diagnostic Performance of Artificial Intelligence-Centred Systems in the Diagnosis and Postoperative Surveillance of Upper Gastrointestinal Malignancies Using Computed Tomography Imaging: A Systematic Review and Meta-Analysis of Diagnostic Accuracy. Ann Surg Oncol. 2022 Mar;29(3):1977-1990. doi: 10.1245/s10434-021-10882-6. Epub 2021 Nov 11. PMID: 34762214; PMCID: PMC8810479.
Get great insights from our expert team.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Terms & Conditions