# 세줄 요약 #
- We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
- For each layer, the feature-maps of all preceding layers are used as inputs, and their own feature-maps are used as inputs into all subsequent layers.
- DenseNet has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
# 상세 리뷰 #
1. Introduction
- As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end of the network.
- We refer to our approach as Dense Convolutional Network (DenseNet), because of its dense connectivity pattern.
- each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers;
- L-layer network connections = L(L+1)/2
- A possibly counter-intuitive effect of this dense connectivity pattern is that it requires fewer parameters than traditional convolutional networks,
- because the DenseNet layers are very narrow (e.g., 12 filters per layer), adding only a small set of feature-maps to the “collective knowledge” of the network and keeping the remaining feature-maps unchanged.
2. DenseNets
- Dense connectivity:
- We introduce direct connections from any layer to all subsequent layers
- Consequently, the lth layer receives the feature-maps of all preceding layers, x_0,...,x_{l-1}, as input:
- where [x_0, x_1, ..., x_{l-1}] refers to the concatenation of the feature-maps produced in layers 0 ~ l-1.
- Transition layers:
- To facilitate down-sampling in our architecture we divide the network into multiple densely connected dense blocks
- we refer to layers between blocks as transition layers, which do convolution and pooling.
- Growth rate:
- (1) An important difference between DenseNet and existing network architectures is that DenseNet can have very narrow layers, e.g., k = 12, and we refer to the hyper-parameter k as the growth rate of the network.
- (2) The DenseNet has a relatively small growth rate but it obtains state-of-the-art results that mean “collective knowledge”
- (* each layer has access to all the preceding feature-maps in its block that can view the feature-maps as the global state of the network).
3. Experiments
- We empirically demonstrate DenseNet’s effectiveness on several benchmark datasets (CIFAR, SVHN, ImageNet) and compare with state-of-the-art architectures (ex. ResNet)
- Classification Results on CIFAR and SVHN: (1) DenseNet-BC with L = 190 and k=40 outperforms the existing state-or-the-art consistently on all the CIFAR datasets, (2) On SVHN, with dropout, the DenseNet with L = 100 and k = 24 also surpasses the current best result achieved by wide ResNet.
- Classification Results on ImageNet: To compare the DenseNets and ResNets top-1 error rates on the ImageNet validation dataset, the results presented in the figure reveal that DenseNets perform on par with the state-of-the-art ResNets.
* Reference: Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
728x90
728x90
댓글