Chest radiography is the most widely used diagnostic method to detect lung cancer. There have been many studies on the development and evaluation of computer-aided diagnosis(CAD) systems for chest radiography. Recently, deep learning–based CAD systems for chest radiography have shown better performance than conventional CADs, with results comparable to or better than those of physicians in the detection of various abnormalities including nodules, tuberculosis and pneumothorax. The aims of the articles reviewed were to study the efficacy of AI algorithms in the detection of lung tissue foci, consolidation zones, interstitial thickening, pleural effusion and pneumothorax.
1. A chest radiograph is still the most frequently performed imaging modality and may provide an opportunity for early detection of lung cancers. A study by Namet al. explored this topic, focusing on the value of a deep learning-based algorithm in detecting Lung CT Screening Reporting and Data System (Lung-RADS) category 4 nodules on chest radiographs from an asymptomatic health checkup population.
In total, 6452 individuals took chest radiographs for health checkup purposes, regardless of smoking history or lung cancer risk. For reference standards to decide the presence of lung nodules, individuals who underwent chest CT scans within 3 months after the radiographs were selected. To support the study, a commercially available deep learning-based algorithm (Lunit INSIGHT CXR version 2.0, Lunit Inc.) was utilised.
In this study, the radiologists increased sensitivity in detecting Lung-RADS 4 nodules on radiographs with aid of this deep learning algorithm in a health checkup population, both in per-radiograph classification. The specificity and false positive ratio did not change significantly because of the introduction of this algorithm.
The authors of these studies included all positive nodules from CT from an annual health checkup cohort, regardless of the visibility of those on radiographs. Additionally, investigators followed Lung-RADS criteria in determining positive and negative cases, which is known to be effective for screening populations. These results are promising, as the introduction of the algorithm in the screening settings might be more feasible if there is less expected harm, which is usually unnecessary radiation exposure inflicted by false positive results.
However, the study does have its limitations. The authors collected a retrospective cohort with radiograph and CT, as there should be a selection bias. Only a single radiograph and single CT for Lung-RADS categorisation of the nodules were evaluated. Lateral images were not included for evaluation. Finally, as investigators performed a reader test only including some of the negative cases. False positives may increase when Lung-RADS 2–3 nodules are included in negative cases.
Summary: Aided by a deep learning algorithm, pooled radiologists improved their sensitivity in detectingLung-RADS category 4 nodules on chest radiographs from a health checkup population (38.8% to 45.1%), without increasing false positive rate. The prevalence of the Lung-RADS category 4 nodules was 3.8% across the population.To confirm the significant detection rate increase by a randomised controlled trial, a sample size of 84,000 would be required.
2. In Sung et al., study investigators used a deep learning–based detection (DLD) system to evaluate the added value of a deep learning–based chest radiography CAD system in identification of nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax. The key advantage of that study was to compare observer performance in detecting and localising major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with DLD system assistance in a randomised crossover design.
Оverall, diagnostic performance of the observers, including thoracic radiologists, in the detection of nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax was improved with use of the DLD system (P= .002 for jackknife alternative free-response receiver operating characteristic [JAFROC] figure of merit [FOM] and area under the receiver operating characteristic curve [AUC]; P = .009. to .01 for per-lesion sensitivity, per-image sensitivity, and per-image specificity). Moreover, the reading time was substantially reduced in all observers with the assistance of the DLD system (from 10–65 to 6–27 seconds).
Compared with study assessing observer performance by Nam et al., the major difference was that investigators conducted a randomised crossover study with a washout period. Because of the retrospective nature of that study, there was a possibility of spectrum bias, although the data were collected from diverse abnormalities. There were no analysed lateral radiographs in that study. Establishing the reference standard by one radiologist was a limitation, although an experienced thoracic radiologist reviewed cases very carefully.
Summary: Using a deep learning–based detection system, observers, including thoracic radiologists, improved detection and localisation of major abnormal findings on chest radiographs with reduced reading time, regardless of experience level. Future studies in a real-world setting will likely help determine the clinical usefulness of the DLD system.
3. One important area of potential clinical application is the use of AI software to interpret chest radiographs (CXRs). However, there is still a large gap between technical achievements and the clinical adoption of the technology, showing decreased performance in external validation as shown in many studies. An example of this would be the study by Kim etal., which focused on the external validation of commercial AI performance for interpreting chest radiographs (CXRs) in a real-world cohort.
The prevalence of clinically significant lesions was 2.2% (68 of 3047).The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630–0.665), 35.3% and 94.2%, respectively.
AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumors. AI-undetected lesions tended to be smaller than true-positive lesions.The improved performance with the integration of an AI software assistant was also unsatisfactory (AUROCs, 0.571–0.688) albeit significantly improved. This meant reading time with AI software assistance was significantly increased compared with the reading time without AI software assistance for all readers.
Summary: The performance of commercial AI in high-volume, low-prevalence cohorts was poorer than expected, although it modestly boosted the performance of less-experienced readers. When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.
List of publications:
1. Nam et al. (Nam JG, KimHJ, Lee EH, Hong W, Park J, Hwang EJ, Park CM, Goo JM. Value of a deeplearning-based algorithm for detecting Lung-RADS category 4 nodules on chestradiographs in a health checkup population: estimation of the sample size for arandomized controlled trial. Eur Radiol. 2022 Jan;32(1):213-222. doi:10.1007/s00330-021-08162-8. Epub 2021 Jul 15. PMID: 34264351.)
2. Sung J, Park S, Lee SM, Bae W, Park B, Jung E, Seo JB,Jung KH. Added Value of Deep Learning-based Detection System for Multiple MajorFindings on Chest Radiographs: A Randomized Crossover Study. Radiology. 2021 May;299(2):450-459. doi:10.1148/radiol.2021202818. Epub 2021 Mar 23. PMID: 33754828.
3. Kim C, Yang Z, ParkSH, Hwang SH, Oh YW, Kang EY, Yong HS. Multicentre external validation of acommercial artificial intelligence software to analyse chest radiographs inhealth screening environments with low disease prevalence. Eur Radiol. 2023 Jan10. doi: 10.1007/s00330-022-09315-z. Epub ahead of print. PMID: 36624227.