본문 바로가기
논문 리뷰/의료영상

Zhang et al., 2019, An investigation of CNN models for differentiating malignant from benign lesions using small pathologically proven datasets

by 펄서까투리 2021. 9. 27.

# 세줄 요약 #

  1. Cancer has been one of the most threatening diseases, so our major goal is to identify malignant from benign lesions at radiology or CT imaging in the early stages, But it is difficult to collect such a large volume of images with pathological information.
  2. This paper explores two CNN models by focusing extensively on the expansion of training samples from two small pathologically proven datasets (colorectal polyp dataset and lung nodule dataset, both datasets less than 70 subjects).
  3. The experimental outcomes of differentiating malignant from benign lesions are quite good(average AUC: colorectal polyps = 0.86, pulmonary nodules = 0.71). 

 

# 상세 리뷰 #

1. Introduction

  • Cancer is the second leading cause of death globally (reported by WHO), efforts have been devoted to the advancement of radiology and transformative tools (non-invasive computed tomographic or CT imaging) to detect and diagnose cancers in the early stage.
    • But the early detection has a high false-positive (FP) rate.
    • For reducing the FP rate, we need a large medical image dataset with pathological information.
    • But in the cancer imaging area, it is very difficult to collect a large number of images from patients with pathologically proven ground truth.
  • To effectively learn from small medical imaging datasets via CNN models, some methods are...
    • (1) Transfer learning
      • huge natural images dataset (ex. ImageNet) to initialize and optimize the weight of the model, then fine the weights by using medical images
      • but the understanding between natural and medical images is quite different.
    • (2) Medical images at multi-scale
      • cutting the original raw images into patches at a different level of the image field.
      • how to assign the labels to those patches when some patches only contain a few lesions.
    • Thus, how to study small dataset via CNN models is still a tough challenge.
  • We proposed two CNN models to classify between malignant and benign polyps/nodules in small datasets.
    • First model: multi-channel-multi-slice two dimensional CNN model (MCMS-2D CNN)
    • Second model: voxel-level one-dimensional CNN model (V-1D CNN)

 

2. Method

  • Method pipeline
    • (1) extracting and generating the inputs
    • (2) designing the CNN models
    • (3) validation for model performance
    • (4) statistical analysis for model performance

Fig 1. The flowchart of our proposed pipeline. [Zhang et al., 2019]

 

2.1. Datasets and inputs of CNN models

  • Most publicly available cancer image datasets are not pathologically proven. (ex. LIDC-IDRI)
    • many nodules in the LIDC dataset, the experts have different opinions and different labels
    • thus bias will be inevitable -> "label noise"
  • Two pathologically proven small datasets.
    • Dataset 1. Colorectal Polyps
      • CT scan at University of Wisconsin, USA.
      • total 59 patients, 63 polyp masses (31 benign, 32 malignant)
      • polyp masses size ranges: 3 ~ 8 cm (mean = 4.2 cm)
      • CT image volume: 400 slices, 512 x 512 array size
    • Dataset 2. Lung Nodules
      • CT-guided lung nodule needle biopsy at Stony Brook University Hospital, USA
      • total 66 patients, 67 lung nodules (18 benign, 49 malignant)
      • lung nodules diameter ranges: 0.91 ~ 13.08 cm (mean = 3.15cm)
      • CT image volume: 200 slices, 512 x 512 array size
    • Both datasets are routing CT scans, the drawn lesion (polyp & nodule) borders, pathological labels are inputted for the CNN model for classification.

Table 1. Two dataset information [Zhang et al., 2019]

 

2.2. Architecture of the proposed CNN models

  • Multi-channel-multi-slice-2D (MCMS-2D) CNN model
    • The Basic Concept:
      • Provide some meaningful deformation image features with a raw image, it will be easier for the model to study and extract high-level features.
    • Multi-Channel Strategy
      • The local binary pattern (LBP) maps: to include the texture information
      • The histogram of gradient (HOG) maps: to include the object's shape & region information
      • The gradients (Grad) of images: to include the edge information
    • Multi-slice Strategy
      • Fix a number of slices from each polyp/nodule: pick up along the top slice to the bottom slice by a certain interval 
      • Trade-off problem: larger slice number to benefit for the large training dataset, smaller slice number to avoid overfitting

Fig 2. The major architecture of MCMS-2D CNN model. [Zhang et al., 2019]
Table 2. Architecture of a standard two-channel MCMS-2D CNN model for Polyp dataset with parameters. [Zhang et al., 2019]

 

  • Voxel-level-1D (V-1D) CNN model
    • The Basic Concept:
      • Generate training samples at the voxel level, study and abstract meaningful features for each voxel with a relatively small "region of interested" (ROI).
      • the training samples can be extremely expanded.
    • Voxel to 1D vector strategy
      • A certain number of voxels are randomly chosen from each whole volume of polyp/nodule (labels are given from pathological report).
      • each slice based on the size of polyp/nodule area for each slice.
      • selected voxel (ROI = 7x7) reshaped into vectors as the input
      • feed those vectors into the V-1D CNN model for classification

Fig 3. The major architecture of V-1D CNN model. [Zhang et al., 2019]
Table 3. The architecture of a standard V-1D CNN model for Lung Nodule dataset with parameters. [Zhang et al., 2019]

 

  • Voting algorithm
    • The above-mentioned CNN Models are working on either slice level or voxel level.
    • Thus, voting algorithm will be utilized to gather the class of the polyp/nodule for each slice/voxel and then predict a final label for every testing polyp/nodule volume.

Fig 4. The description of voting algorithm. [Zhang et al., 2019]

 

2.3. Validation strategy

  • Strategy 1. two-fold cross-validation:
    • the whole sample is randomly divided into two equal parts -> one part for training, another part for validation.
  • Strategy 2. leave-one-out cross-validation:
    • one polyp/nodule will be randomly selected at each time. -> selection for training, left one for validation.
  • Strategy 1 result shown in "3. Result" chapter, Strategy 2 result in the appendix. 

 

3. Result

3.1. Malignant-benign classification performance on the polyp dataset

  • Two-fold cross-validation via MCMS-2D CNN model

Table 4. Classification performance of the MCMS-2D CNN model on the polyp dataset (Mean ± SD). [Zhang et al., 2019]
Fig 5. Classification performance from different models on the polyp dataset. [Zhang et al., 2019]

 

3.2. Malignant-benign classification performance on the lung nodule dataset

  • Two-fold cross-validation via MCMS-2D CNN model 

Table 5. Classification performance of the MCMS-2D CNN model on lung nodule dataset (Mean ± SD). [Zhang et al., 2019]

 

  • Two-fold cross-validation via V-1D CNN model

Table 6. Classification performance of the V-1D CNN model on lung nodule dataset (Mean ± SD). [Zhang et al., 2019]
Fig 6. Classification performance from different models on lung nodule dataset. [Zhang et al., 2019]

 

* Reference: Zhang, Shu, et al. "An investigation of CNN models for differentiating malignant from benign lesions using small pathologically proven datasets." Computerized Medical Imaging and Graphics 77 (2019): 101645

728x90
728x90

댓글