본문 바로가기
논문 리뷰/딥러닝

[ResNet] He et al., 2015, Deep Residual Learning for Image Recognition

by 펄서까투리 2021. 10. 17.

# 세줄 요약 #

  1. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
  2. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth (evaluate residual nets with a depth of up to 152 layers). 
  3. This result won the 1st place on the ILSVRC 2015 classification task (3.57% error on the ImageNet test set).

 

# 상세 리뷰 #

1. Introduction

  • Is learning better networks as easy as stacking more layers?
    • An obstacle to answering this question was the notorious problem of vanishing/exploding gradients.
    • When deeper networks are able to start converging, a degradation problem has been exposed.
      • with the network depth increasing, accuracy gets saturated.
      • adding more layers to a suitably deep model leads to higher training error.

Fig 1. Training error (left) and test error (right) on CIFAR-10 with 20-layer and 56-layer "plain" network.

 

  • We address the degradation problem by introducing a deep residual learning framework.
    • There exists a solution by construction to the deeper model: the added layers are identity mapping
    • Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual fitting
    • The formulation of F(x) + x can be realized by feedforward neural networks with "shortcut connection"

Fig 2. Residual learning: a building block.

 

  • We present comprehensive experiments on ImageNet to show the degradation problem and evaluate our method.
    • 1) Our extremely deep residual nets are easy to optimize.
    • 2) Our deep residual nets can easily enjoy accuracy gains from greatly increased depth. 
    • Our ensemble has 3.57% top-5 error on the ImageNet test set, and won the 1st place in the ILSVRC 2015 classification competition.

 

2. Deep Residual Learning

  • Residual Learning
    • H(x): An underlying mapping to be fit by a few stacked layers
    • x: The inputs to the first of these layers.
    • H(x) - x: hypothesis 'H(x) & x' can asymptotically approximate the residual functions.
      • (assuming that the input and output are of the same dimensions)
    • F(x) = H(x) -x:  we explicitly let these layers approximate a residual function.
    • F(x) + x: The original mapping.
  • Identity Mapping by Shortcuts
    • We adopt residual learning to every few stacked layers.
    • A building block: 
      • y = F(x, {Wi}) + x   (eq1)
    • If dimensions of x and F are not equal, we can perform a linear projection Ws (= matching dimension).
      • y = F(x, {Wi}) + Ws * x   (eq2)

Fig 5. A deeper residual function F for ImageNet. Left: a building block (ResNet-34), Right: a 'bottleneck' building block (ResNet-50/101/152).

  • Network Architectures

Fig 3. Example network architectures for ImageNet. Left: VGG-19 model (19.6 billion FLOPs), Middle: a plain network with 34 layers (3.6 billion FLOPs), Right: a residual network with 34 layers (3.6 billion FLOPs).
Table 1. Architectures for ImageNet. Down-sampling is performed by conv3_1, conv4_1, conv5_1 with a stride 2.

 

3. Experiments

  • ImageNet classification.

Fig 4. Training on ImageNet. Thin curves denote training error, and bold curves denote validation error of the center crops. Left: plain networks of 18 and 34 layers. Right: ResNets of 18 and 34 layers.
Table 2. Top-1 error (%, 10-crop testings) on ImageNet validation.
Table 5. Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server.

 

* Reference: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

728x90
728x90

댓글