Evidence Based AI

Evidence-based review of novel clinical AI applications in musculoskeletal radiology

Promising results from systemic reviews and meta-analyses have shown that artificial intelligence systems can be trained to detect and classify wrist, hand, ankle, hip, proximal humerus, rib and thoracolumbar spine fractures on radiographs with a diagnostic accuracy comparable to that of radiologists. Dr. Sergey Morozov, Chief Innovation Officer at Osimis, investigates.

In this issue of our ongoing series, I want to take a closer look at several studies on the impact of artificial intelligence methods in musculoskeletal radiology, mostly for fracture detection. These articles have been selected for review based on their clinical significance and citation indexes of the journals. And they intrigue me. Why? 

These systematic reviews and meta-analyses have revealed promising results that artificial intelligence systems can be trained to detect and classify wrist, hand, ankle, hip, proximal humerus, rib and thoracolumbar spine fractures on radiographs with a diagnostic accuracy comparable to that of radiologists. 

The improvement in sensitivity was significant at all points except the shoulder, clavicle, and thoracolumbar spine. More subtle fractures (such as non-displaced femoral neck fractures or scaphoid fractures) require further study because artificial intelligence models may be less accurate. Even if the reduction in reading time is only a few seconds per radiographic examination, it can result in significant time savings for radiographers who can read 200-300 radiographs per day. 

Nevertheless, before AI algorithms can be transferred into routine practice, they must be externally validated in a prospective study representing a relevant sample of patients. Finally, as with other clinical prediction models, AI systems should be evaluated in the context of randomised clinical trials to assess their impact on patient-centered outcomes. But these challenges remain to be tackled.

1. Kuo RYL et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022 Jul;304(1):50-62.

Patients with fractures are a common emergency and can be misdiagnosed through radiologic imaging. In fact, between 3% and 10% of patients are likely to experience a missed or delayed diagnosis of fractures on radiography. An increasing number of studies are using artificial intelligence (AI) techniques to detect fractures as an adjunct to clinical diagnosis. In a systematic review and meta-analysis by Kuo et al. 42 studies (37 studies with radiography and five studies with computed tomography), the cumulative diagnostic effectiveness of using artificial intelligence (AI) to detect fractures had a sensitivity of 92% and 91% and specificity of 91 and 91%, with internal and external validation, respectively.

The list of studies included analysis of proximal femur, vertebral, upper humerus, distal radius, scaphoid, calcaneal, supracondylar or lateral condyle elbow fractures. The performance of clinicians was comparable to that of AI in detecting fractures (sensitivity 91%, 92%; specificity 94%, 94%). There were no statistically significant differences between clinician and AI performance. But the studies have several disadvantages

  1. Kuo et al. used a checklist based on Transparent Reporting of a Multivariable Prediction Model for individual prognosis or diagnosis (or TRIPOD), which is a common tool for clinical prediction model studies. They could have also used the Checklist for Artificial Imaging in medical imaging (CLAIM), a specific checklist for AI in medical imaging. The CLAIM checklist was released in 2020 and covers the key elements that need to be reported in studies of the diagnostic accuracy of AI systems to ensure that the study can be properly evaluated for bias and applicability. CLAIM can also be applied to non-diagnostic AI studies in radiology (e.g., natural language processing of imaging reports) and for image analysis studies outside of radiology (such as image analysis in pathology or dermatology).

 

  1. One of the primary reasons for the high risk of bias was the selection of study participants; approximately one-third of studies were judged to be at high risk of selection bias, and about half were at high concern for applicability in this domain. Most studies included in the systematic review by Kuo et al did not perform an external validation of the AI algorithm they developed. Only 13 studies performed external validation of outcomes, and only one study evaluated the effectiveness of AI in a prospective clinical trial.

  1. Kuo et al included 23 contingency tables from seven studies where AI was compared with radiologists on the same external validation test sets. However, the evaluation of radiologist performance may have been underestimated. One reason is that radiologists were often asked to interpret the radiographs or CT scans in conditions not typical of routine practice. For example, clinical history (e.g., site of pain or mechanism of trauma) was provided to radiologists in only one study. At the same time, providing clinical data is normal for routine practice.

Transparent reporting is necessary so that users can assess the most important elements to validate the quality of the study and judge whether the results are applicable to the intended user population. For example, the 17 studies did not report on the male/female proportion of study participants, and 15 researchers did not report information about the age of participants.

Conclusion: The results from this meta-analysis cautiously suggest that AI is not inferior to clinicians in terms of diagnostic performance in fracture detection, showing promise as a useful diagnostic tool. But many studies have limited real-world applicability because of flawed methods or unrepresentative data sets. As a result, future research must prioritise pragmatic algorithm development.

2. Guermazi A. et al. Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence. Radiology. 2022 Mar;302(3):627-636.

According to Guermazi's retrospective analysis, fracture interpretation errors can account for up to 24% of the harmful diagnostic errors encountered in emergency departments.

Inconsistencies in radiographic fracture diagnosis are more common during the evening and night hours. The purpose of this study was to assess the impact of AI assistance on physicians' diagnostic performance when diagnosing fractures on radiographs. In a retrospective study of 480 patients, AI-assisted interpretation of radiographs by six types of readers showed a 10.4% improvement in fracture detection sensitivity (75.2% vs. 64.8%) with no decrease in specificity. AI assistance also reduced radiograph reading time by 6.3 seconds per patient. According to the authors, a major benefit that AI can bring to clinical practice – particularly in the acute care setting – is its potential to function as a triage system in busy medical centers. Another benefit of AI is the reduction in reading time. Even if the reduction in reading time is only a few seconds per radiographic examination, it can result in significant time savings for radiographers who can read 200-300 radiographs per day. AI assistance can also be helpful for the detection of non-obvious or subtle fractures. 

It should be noted that this study had some limitations. 

  1. All radiographs were read without clinical information and retrospective. In real life, non-radiologist clinicians can examine the patient and obtain detailed history to identify the area of concern. 
  2. Due to the artificially set 50% prevalence of fractures in the study sample, it was not possible to calculate positive or negative predictive values. This artificial balance between anatomic locations and reader specialties made the sample non-representative. 
  3. Guermazi’s study did not use CT for fracture confirmation. It is important to note that the use of AI does not improve the detection of radiographically occult fractures because it is, by definition, “negative,” on radiographs.

Conclusion: Artificial intelligence assistance for searching skeletal fractures on radiographs improved the sensitivity and specificity of readers and shortened their reading time, as stated by the authors. The improvement in sensitivity was significant at all points except the shoulder, clavicle, and thoracolumbar spine. In fact, the stand-alone AI outperformed human readers for the detection of rib and thoracolumbar spine fractures. Even if the reduction in reading time is only a few seconds per radiographic examination, it can result in significant time savings for radiographers who can read 200-300 radiographs per day.

Stand-alone artificial intelligence (AI) performance examples: positive radiographs for fractures.
  1. X-ray shows a single true-positive fracture of the right femoral neck (arrows). This fracture was detected by AI. One senior and one junior radiologist, two emergency department physicians, three rheumatologists, and one family medicine physician missed the fracture.
  2. Additional dedicated view of the right hip clearly shows the fracture.
  3. X-ray shows true-positive multiple left-sided rib fractures. Two senior and one junior radiologist recognised only two fractures.
  4. X-ray shows true-positive fractures of the L3 and L4 vertebral bodies. Thirteen readers pointed out the two fractures without AI. Nineteen readers pointed out the two fractures with AI. Five readers missed one vertebral fracture with and without AI.
Stand-alone AI performance examples: false-positive and false-negative X-rays.

  1. X-ray shows a small corticated ossific fragment adjacent to the inferior glenoid margin (arrow). AI noted this as an acute fracture. Fifteen readers read this as acute fracture without AI. Four readers thought the fracture was chronic without using AI but reversed their reading with AI. Only five readers recognized the chronicity of the fracture.
  2. X-ray shows a subtle fracture of the fifth metacarpal base, which was not detected by AI. Only ground truth readers noted the fracture.
  3. This fracture was not clearly visible on the oblique view.

3. Langerhuizen DWG et al. What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review. Clin Orthop Relat Res. 2019 Nov;477(11):2482-2491. 

A systematic review by Langerhuizen et al. raised more specific questions. What is the proportion of correctly detected or classified fractures and the area under the receiving operating characteristic (AUC) curve of AI fracture detection and classification models? And what is the performance of AI in this setting compared with the performance of human examiners? 

For fracture detection, the AUC in five studies reflected near-perfect prediction (range, 0.95-1.0), and the accuracy in seven studies ranged from 83% to 98%. For fracture classification, the AUC was 0.94 in one study, and the accuracy in two studies ranged from 77% to 90%. Langerhuizen’s review showed that AI is very good for detecting common fractures

Conclusion: AI may enhance processing and communicating probabilistic tasks in medicine, including orthopaedic surgery. AI outperformed human examiners for detecting and classifying hip and proximal humerus fractures and showed equivalent performance for detecting wrist, hand and ankle fractures. More subtle fractures (such as non-displaced femoral neck fractures or scaphoid fractures) require further study because artificial intelligence models may be less accurate. At present, inadequate reference standard assignments to train and test AI is the biggest hurdle before integration into clinical workflows.

List of publications:

Kuo RYL, Harrison C, Curran TA, Jones B, Freethy A, Cussons D, Stewart M, Collins GS, Furniss D. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022 Jul;304(1):50-62. doi: 10.1148/radiol.211785. Epub 2022 Mar 29. PMID: 35348381; PMCID: PMC9270679.

Guermazi A, Tannoury C, Kompel AJ, Murakami AM, Ducarouge A, Gillibert A, Li X, Tournier A, Lahoud Y, Jarraya M, Lacave E, Rahimi H, Pourchot A, Parisien RL, Merritt AC, Comeau D, Regnard NE, Hayashi D. Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence. Radiology. 2022 Mar;302(3):627-636. doi: 10.1148/radiol.210937. Epub 2021 Dec 21. PMID: 34931859.

Langerhuizen DWG, Janssen SJ, Mallee WH, van den Bekerom MPJ, Ring D, Kerkhoffs GMMJ, Jaarsma RL, Doornberg JN. What Are the Applications and Limitations of Artificial Intelligence for Fracture Detection and Classification in Orthopaedic Trauma Imaging? A Systematic Review. Clin Orthop Relat Res. 2019 Nov;477(11):2482-2491. doi: 10.1097/CORR.0000000000000848. PMID: 31283727; PMCID: PMC6903838.

📸 image credit to DALL-E. Our search: "two meeting arms with stretched fingers xray with artificial intelligence in michelangelo style"

Newsletter
Get great insights from our expert team.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Terms & Conditions