본문 바로가기
논문 리뷰/딥러닝

[DenseNet] Huang et al., 2017, Densely Connected Convolutional Networks

by 펄서까투리 2022. 3. 13.

# 세줄 요약 #

  • We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
  • For each layer, the feature-maps of all preceding layers are used as inputs, and their own feature-maps are used as inputs into all subsequent layers.
  • DenseNet has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

# 상세 리뷰 #

1. Introduction

  • As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end of the network.
  • We refer to our approach as Dense Convolutional Network (DenseNet), because of its dense connectivity pattern.
    • each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers;
    • L-layer network connections = L(L+1)/2

  • A possibly counter-intuitive effect of this dense connectivity pattern is that it requires fewer parameters than traditional convolutional networks,
    • because the DenseNet layers are very narrow (e.g., 12 filters per layer), adding only a small set of feature-maps to the “collective knowledge” of the network and keeping the remaining feature-maps unchanged.

2. DenseNets

  • Dense connectivity:
    • We introduce direct connections from any layer to all subsequent layers 
    • Consequently, the lth layer receives the feature-maps of all preceding layers, x_0,...,x_{l-1}, as input:  
    • where [x_0, x_1, ..., x_{l-1}] refers to the concatenation of the feature-maps produced in layers 0 ~ l-1.
  • Transition layers:
    • To facilitate down-sampling in our architecture we divide the network into multiple densely connected dense blocks
    • we refer to layers between blocks as transition layers, which do convolution and pooling.
  • Growth rate:
    • (1) An important difference between DenseNet and existing network architectures is that DenseNet can have very narrow layers, e.g., k = 12, and we refer to the hyper-parameter k as the growth rate of the network.
    • (2) The DenseNet has a relatively small growth rate but it obtains state-of-the-art results that mean “collective knowledge”
      • (* each layer has access to all the preceding feature-maps in its block that can view the feature-maps as the global state of the network).

 

3. Experiments

  • We empirically demonstrate DenseNet’s effectiveness on several benchmark datasets (CIFAR, SVHN, ImageNet) and compare with state-of-the-art architectures (ex. ResNet)
  • Classification Results on CIFAR and SVHN: (1) DenseNet-BC with L = 190 and k=40 outperforms the existing state-or-the-art consistently on all the CIFAR datasets, (2) On SVHN, with dropout, the DenseNet with L = 100 and k = 24 also surpasses the current best result achieved by wide ResNet.

  • Classification Results on ImageNet: To compare the DenseNets and ResNets top-1 error rates on the ImageNet validation dataset, the results presented in the figure reveal that DenseNets perform on par with the state-of-the-art ResNets.

 

* Reference: Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

728x90
728x90

댓글