evidence based AI

Evidence-based review for novel clinical AI applications in lung nodules evaluation

Radiologists can improve diagnostic accuracy in identifying pulmonary foci and other pulmonary pathology using artificial intelligence algorithms without increasing the number of false positives. However, a large sample is needed to show the effect of the algorithms in a prospective trial. What does the most recent research teach us? Osimis investigates.

Chest radiography is the most widely used diagnostic method to detect lung cancer. There have been many studies on the development and evaluation of computer-aided diagnosis(CAD) systems for chest radiography. Recently, deep learning–based CAD systems for chest radiography have shown better performance than conventional CADs, with results comparable to or better than those of physicians in the detection of various abnormalities including nodules, tuberculosis and pneumothorax. The aims of the articles reviewed were to study the efficacy of AI algorithms in the detection of lung tissue foci, consolidation zones, interstitial thickening, pleural effusion and pneumothorax.

1. A chest radiograph is still the most frequently performed imaging modality and may provide an opportunity for early detection of lung cancers. A study by Namet al. explored this topic, focusing on the value of a deep learning-based algorithm in detecting Lung CT Screening Reporting and Data System (Lung-RADS) category 4 nodules on chest radiographs from an asymptomatic health checkup population.  

In total, 6452 individuals took chest radiographs for health checkup purposes, regardless of smoking history or lung cancer risk. For reference standards to decide the presence of lung nodules, individuals who underwent chest CT scans within 3 months after the radiographs were selected. To support the study, a commercially available deep learning-based algorithm (Lunit INSIGHT CXR version 2.0, Lunit Inc.) was utilised.

In this study, the radiologists increased sensitivity in detecting Lung-RADS 4 nodules on radiographs with aid of this deep learning algorithm in a health checkup population, both in per-radiograph classification. The specificity and false positive ratio did not change significantly because of the introduction of this algorithm.

A 77-year-old femalepatient with lung adenocarcinoma in the right lower lobe. A. The nodule presented as a 2.2-cm-sized part solid nodule with a 7-mm solid portion, categorized as Lung-RADS 4A nodule. B. On a radiograph, the nodule was vaguely visualized (conspicuity score 2). C. Only one out of four readers detected the nodule, and the algorithm detected the lesion with an activation value of 31.Aided by the algorithm, two more readers successfully detected the nodule. The algorithm also produced two false positives for calcified granulomas, but all readers neglected the lesions.

The authors of these studies included all positive nodules from CT from an annual health checkup cohort, regardless of the visibility of those on radiographs. Additionally, investigators followed Lung-RADS criteria in determining positive and negative cases, which is known to be effective for screening populations. These results are promising, as the introduction of the algorithm in the screening settings might be more feasible if there is less expected harm, which is usually unnecessary radiation exposure inflicted by false positive results.

However, the study does have its limitations. The authors collected a retrospective cohort with radiograph and CT, as there should be a selection bias. Only a single radiograph and single CT for Lung-RADS categorisation of the nodules were evaluated. Lateral images were not included for evaluation. Finally, as investigators performed a reader test only including some of the negative cases. False positives may increase when Lung-RADS 2–3 nodules are included in negative cases. 

Summary: Aided by a deep learning algorithm, pooled radiologists improved their sensitivity in detectingLung-RADS category 4 nodules on chest radiographs from a health checkup population (38.8% to 45.1%), without increasing false positive rate. The prevalence of the Lung-RADS category 4 nodules was 3.8% across the population.To confirm the significant detection rate increase by a randomised controlled trial, a sample size of 84,000 would be required.

2. In Sung et al., study investigators used a deep learning–based detection (DLD) system to evaluate the added value of a deep learning–based chest radiography CAD system in identification of nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax. The key advantage of that study was to compare observer performance in detecting and localising major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with DLD system assistance in a randomised crossover design.

Images of a 55-year-old man with biopsy-proven necrotising pneumonia. (a) Original chest radiograph shows a mass-like consolidation in the right lower lung zone that abuts to right cardiac border without silhouette sign. Without the deep learning–based detection (DLD) system, four observers annotated a true-positive mark on consolidation, but one board-certified thoracic radiologist and one board-certified non-thoracic radiologist (subspecialty-trained abdominal radiologist) missed it.(b, c) Corresponding axial (b) and coronal (c)contrast-enhanced CT scans reveal consolidation with necrotic portion in right lower lobe. (d) Chest radiograph with a DLD-generated true positive mark on consolidation (circle). With the DLD mark, all observers detected consolidation.

Оverall, diagnostic performance of the observers, including thoracic radiologists, in the detection of nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax was improved with use of the DLD system (P= .002 for jackknife alternative free-response receiver operating characteristic [JAFROC] figure of merit [FOM] and area under the receiver operating characteristic curve [AUC]; P = .009. to .01 for per-lesion sensitivity, per-image sensitivity, and per-image specificity). Moreover, the reading time was substantially reduced in all observers with the assistance of the DLD system (from 10–65 to 6–27 seconds).


Compared with study assessing observer performance by Nam et al., the major difference was that investigators conducted a randomised crossover study with a washout period. Because of the retrospective nature of that study, there was a possibility of spectrum bias, although the data were collected from diverse abnormalities. There were no analysed lateral radiographs in that study. Establishing the reference standard by one radiologist was a limitation, although an experienced thoracic radiologist reviewed cases very carefully.


Summary: Using a deep learning–based detection system, observers, including thoracic radiologists, improved detection and localisation of major abnormal findings on chest radiographs with reduced reading time, regardless of experience level. Future studies in a real-world setting will likely help determine the clinical usefulness of the DLD system.

3. One important area of potential clinical application is the use of AI software to interpret chest radiographs (CXRs). However, there is still a large gap between technical achievements and the clinical adoption of the technology, showing decreased performance in external validation as shown in many studies. An example of this would be the study by Kim etal., which focused on the external validation of commercial AI performance for interpreting chest radiographs (CXRs) in a real-world cohort.

The prevalence of clinically significant lesions was 2.2% (68 of 3047).The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630–0.665), 35.3% and 94.2%, respectively.

AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumors. AI-undetected lesions tended to be smaller than true-positive lesions.The improved performance with the integration of an AI software assistant was also unsatisfactory (AUROCs, 0.571–0.688) albeit significantly improved. This meant reading time with AI software assistance was significantly increased compared with the reading time without AI software assistance for all readers.

An example of an undetected (false-negative) case of lung cancer. а. On the chest computed tomography image with coronal reconstruction, a3.2-cm lobulated mass is shown in the left upper lobe (arrows), which was later confirmed to be lung cancer. в. Artificial intelligence software did not detect this mass in the chest radiograph (arrows).


Summary:
The performance of commercial AI in high-volume, low-prevalence cohorts was poorer than expected, although it modestly boosted the performance of less-experienced readers. When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.

 

List of publications:

1. Nam et al. (Nam JG, KimHJ, Lee EH, Hong W, Park J, Hwang EJ, Park CM, Goo JM. Value of a deeplearning-based algorithm for detecting Lung-RADS category 4 nodules on chestradiographs in a health checkup population: estimation of the sample size for arandomized controlled trial. Eur Radiol. 2022 Jan;32(1):213-222. doi:10.1007/s00330-021-08162-8. Epub 2021 Jul 15. PMID: 34264351.)

2. Sung J, Park S, Lee SM, Bae W, Park B, Jung E, Seo JB,Jung KH. Added Value of Deep Learning-based Detection System for Multiple MajorFindings on Chest Radiographs: A Randomized Crossover Study. Radiology. 2021 May;299(2):450-459. doi:10.1148/radiol.2021202818. Epub 2021 Mar 23. PMID: 33754828.

3. Kim C, Yang Z, ParkSH, Hwang SH, Oh YW, Kang EY, Yong HS. Multicentre external validation of acommercial artificial intelligence software to analyse chest radiographs inhealth screening environments with low disease prevalence. Eur Radiol. 2023 Jan10. doi: 10.1007/s00330-022-09315-z. Epub ahead of print. PMID: 36624227.

📷 Image generated by Frédéric Lambrechts using DALL-E, with instruction: "alternative view of the chest x ray, featuring a bounding box and an imaginary AI vendor logo"

Newsletter
Get great insights from our expert team.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Terms & Conditions