[ResNet] He et al., 2015, Deep Residual Learning for Image Recognition

# 세줄 요약 #

We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth (evaluate residual nets with a depth of up to 152 layers).
This result won the 1st place on the ILSVRC 2015 classification task (3.57% error on the ImageNet test set).

# 상세 리뷰 #

1. Introduction

Is learning better networks as easy as stacking more layers?
- An obstacle to answering this question was the notorious problem of vanishing/exploding gradients.
- When deeper networks are able to start converging, a degradation problem has been exposed.
  - with the network depth increasing, accuracy gets saturated.
  - adding more layers to a suitably deep model leads to higher training error.

Fig 1. Training error (left) and test error (right) on CIFAR-10 with 20-layer and 56-layer "plain" network.

We address the degradation problem by introducing a deep residual learning framework.
- There exists a solution by construction to the deeper model: the added layers are identity mapping
- Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual fitting
- The formulation of F(x) + x can be realized by feedforward neural networks with "shortcut connection"

Fig 2. Residual learning: a building block.

We present comprehensive experiments on ImageNet to show the degradation problem and evaluate our method.
- 1) Our extremely deep residual nets are easy to optimize.
- 2) Our deep residual nets can easily enjoy accuracy gains from greatly increased depth.
- Our ensemble has 3.57% top-5 error on the ImageNet test set, and won the 1st place in the ILSVRC 2015 classification competition.

2. Deep Residual Learning

Residual Learning
- H(x): An underlying mapping to be fit by a few stacked layers
- x: The inputs to the first of these layers.
- H(x) - x: hypothesis 'H(x) & x' can asymptotically approximate the residual functions.
  - (assuming that the input and output are of the same dimensions)
- F(x) = H(x) -x: we explicitly let these layers approximate a residual function.
- F(x) + x: The original mapping.

Identity Mapping by Shortcuts
- We adopt residual learning to every few stacked layers.
- A building block:
  - y = F(x, {Wi}) + x (eq1)
- If dimensions of x and F are not equal, we can perform a linear projection Ws (= matching dimension).
  - y = F(x, {Wi}) + Ws * x (eq2)

Fig 5. A deeper residual function F for ImageNet. Left: a building block (ResNet-34), Right: a 'bottleneck' building block (ResNet-50/101/152).

Network Architectures

Fig 3. Example network architectures for ImageNet. Left: VGG-19 model (19.6 billion FLOPs), Middle: a plain network with 34 layers (3.6 billion FLOPs), Right: a residual network with 34 layers (3.6 billion FLOPs).

Table 1. Architectures for ImageNet. Down-sampling is performed by conv3_1, conv4_1, conv5_1 with a stride 2.

3. Experiments

ImageNet classification.

Fig 4. Training on ImageNet. Thin curves denote training error, and bold curves denote validation error of the center crops. Left: plain networks of 18 and 34 layers. Right: ResNets of 18 and 34 layers.

Table 2. Top-1 error (%, 10-crop testings) on ImageNet validation.

Table 5. Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server.

* Reference: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

728x90

'논문 리뷰 > 딥러닝' 카테고리의 다른 글

[Inception V3] Szegedy et al., 2016, Rethinking the Inception Architecture for Computer Vision (0)	2022.04.11
[MobileNet] Howard et al., 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (0)	2022.03.21
[DenseNet] Huang et al., 2017, Densely Connected Convolutional Networks (0)	2022.03.13
[GoogLeNet] Szegedy et al., 2015, Going Deeper with Convolutions (0)	2020.12.16
[VGG] Simonyan & Zisserman, 2015, Very Deep Convolutional Networks For Large-Scale Image Recognition (0)	2020.12.09

펄서까투리의 세줄요약 리뷰 블로그

[ResNet] He et al., 2015, Deep Residual Learning for Image Recognition

'논문 리뷰 > 딥러닝' 카테고리의 다른 글

댓글

티스토리툴바

[ResNet] He et al., 2015, Deep Residual Learning for Image Recognition

'논문 리뷰 > 딥러닝' 카테고리의 다른 글

관련글

댓글

티스토리툴바