본문 바로가기
논문 리뷰/의료영상

Hwang et al, 2019, Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs.

by 펄서까투리 2021. 10. 20.

# 세줄 요약 #

  1. To develop a deep learning-based algorithm that can classify normal and abnormal results from chest radiographs with major thoracic diseases (pulmonary malignant neoplasm, active tuberculosis, pneumonia, pneumothorax).
  2. This diagnostic study developed a deep learning-based algorithm using single-center data (chest radiographs: 54,221 normal findings; 35,613 abnormal findings) and externally validated with multi-center data (chest radiographs: 486 normal results; 529 abnormal results).
  3. The algorithm demonstrated a median (range) area under the curve of 0.979 for image-wise classification (AUROC; Area Under the Receiver Operating Characteristic curve) and 0.972 for lesion-wise localization (AUAFROC; Area Under the alternative Free-response Receiver Operating Characteristic curve) that performances are significantly higher than Fifteen physicians result.

 

# 상세 리뷰 #

1. Introduction

  • Chest radiographs (CRs) have been used as a first-line examination for the evaluation of various thoracic diseases worldwide.
    • Interpretation of CR, however, remains a challenging task requiring both experience and expertise.
    • Because the interpretation is prone to errors, which has led to an increased workload for radiologists.
  • Thus it is not surprising that computer-aided diagnosis (CAD) for CRs has remained an attractive topic for researches.
    • Recently, the deep learning technique demonstrated promising results in medical image analysis.
  • Previously, we investigated deep-learning-based automatic detection algorithms (DLADs) for the classification of CRs with malignant nodules and active pulmonary tuberculosis.
    • However, those algorithms had limited clinical utility.
    • Therefore, the purpose of our study was to develop a DLAD for major thoracic diseases on CRs and to validate its performance using independent data sets in comparison with physicians.

 

2. Method

  • Data collection and Curation
    • Raw data collection
      • 57,481 CRs with normal results & 41,140 CRs with abnormal results
      • collected between 2016.11.1 ~ 2017.01.31 from a single institution (Institution A)
      • Abnormal 4 categories: pulmonary malignant neoplasms, active pulmonary tuberculosis, pneumonia, pneumothorax
    • Data curation
      • all CRs reviewed by 1~15 board-certified radiologists
      • Step 1. image labeling, confirm each CR category (normal or abnormal)
      • Step 2. image annotation, marked the exact location of each abnormal finding on the CR.
      • Finally, 54,221 normal CRs from 47,917 individuals & 35,613 abnormal CRs from 14,102 individuals
        • exclude 3260 normal CRs & 5527 abnormal CRs by reviewing radiologists.
    • Dataset setting
      • Training dataset: 53,621 normal CRs & 34,074 abnormal CRs
      • Validation dataset (hyperparameter tuning) : 300 normal CRs & 750 abnormal CRs
      • Test dataset (in-house validation data set): 300 normal CRs & 789 abnormal CRs
  • Development of the DLAD algorithm
    • A deep CNN (Convolutional Neural Network) with dense blocks comprising 5 parallel classifiers.
      • 4 abnormal categories classifier + abnormal classifier (any target of diseases)
    • 2 types of losses were used to train the algorithm:
      • classification loss & localization loss
  • Evaluation of DLAD Performance
    • external validation: 5 independent data sets
      • collected & curated between 2018.05.01 ~ 2018.07.31
      • 4 hospitals in Korea (institutions A ~ D) & 4 hospitals in France (institution E)
    • Overall, 486 normal CRs & 529 abnormal CRs
  • Observer Performance Test
    • To compare between DLAD and Physicians
    • 15 physicians with varying experience:
      • 5 thoracic radiologists (9~14yrs)
      • 5 board-certified radiologists (5~7yrs)
      • 5 non-radiology physicians 
    • Test method
      • Session 1. Observers independently assessed every CR, without the assistance of the DLAD.
      • Session 2. Observers reevaluated every CR with the assistance of the DLAD.

 

3. Result

  • Image-Wise Classification Performace of the DLAD
    • In-house validation performance: AUROC = 0.965 (95% CI, 0.955~0.975) 
    • External validation performance: median AUROC = 0.979 (0.973~1.000)
  • Lesion-Wise Localization Performace of the DLAD
    • In-house validation performance: AUAFROC = 0.916 (95% CI, 0.900~0.932) 
    • External validation performance: median AUAFROC = 0.972 (0.923~0.985)

 

  • Comparison Between the DLAD and Physicians
    • Session 1 (w/o DLAD assistance): 
      • non-radiology physicians: 0.814 AUROC / 0.781 AUAFROC
      • board-certified radiologists: 0.896 AUROC / 0.870 AUAFROC
      • thoracic radiologists: 0.932 AUROC / 0.907 AUAFROC
    • Session 2 (w/i DLAD assistance):
      • non-radiology physicians: 0.904 AUROC / 0.873 AUAFROC
      • board-certified radiologists: 0.939 AUROC / 0.919 AUAFROC
      • thoracic radiologists: 0.958 AUROC / 0.938 AUAFROC

 

 

* Reference: Hwang, Eui Jin, et al. "Development and validation of a deep learning–based automated detection algorithm for major thoracic diseases on chest radiographs." JAMA network open 2.3 (2019): e191095-e191095.

728x90
728x90

댓글