# 세줄 요약 #
- MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks.
- We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy.
- We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification.
# 상세 리뷰 #
1. Introduction
- The general trend has been to make deeper and more complicated networks in order to achieve higher accuracy.
- However, in many real-world applications such as robotics, self-driving car, and augmented reality, the recognition tasks need to be carried out in a timely fashion on a computationally limited platform.
- This paper proposes a class of network architectures that allows a model developer to specifically choose a small network that matches the resource restrictions (latency, size) for their application
- many papers on small networks focus only on size but do not consider speed.
2. MobileNet Architecture
- For MobileNets,
- the depthwise convolution applies a single filter to each input channel and,
- the pointwise convolution then applies a 1 x 1 convolution to combine the outputs of the depthwise convolution.
- By expressing convolution as a two step process of filtering and combining we get a reduction in the computation of:
- (Dk^2 * M * Df^2 + M * N * Df^2)/(Dk^2 * M * N * Df^2) = 1/N + 1/Dk^2
- Standard convolutions computation cost: Dk^2 * M * N * Df^2
- Depthwise convolution computation cost: Dk^2 * M * Df^2
- Pointwise convolution computation cost: M * N * Df^2
- (* Df: spatial dimension of the input feature map, M: number of input channel, N: number of output channel, Dk: spatial dimension of the kernel)
- this factorization has the effect of drastically reducing computation and model size.
- Reference: https://pulsar-kkaturi.tistory.com/entry/Depthwise-Separable-convolution%EC%9D%B4-%EA%B8%B0%EC%A1%B4%EC%9D%98-convolution-%EB%B3%B4%EB%8B%A4-%EC%97%B0%EC%82%B0%EB%9F%89%EC%9D%B4-%EC%A0%81%EC%9D%80-%EC%9D%B4%EC%9C%A0?category=1137385
- (Dk^2 * M * Df^2 + M * N * Df^2)/(Dk^2 * M * N * Df^2) = 1/N + 1/Dk^2
- Although the base MobileNet architecture is already small and low latency, many times a specific use case or application may require:
- the width multiplier: multiplier to input & output layer channels
- the resolution multiplier: multiplier to input image and internal feature maps
3. Experiments
- We first investigate the effects of depthwise separable convolutions and the trade-offs of reducing the network based on the two hyper-parameters (width multiplier & resolution multiplier).
- The result of MobileNet with depthwise separable convolutions only reduces accuracy by 1% on Imagenet was saving tremendously on mult-adds and parameters rather than full convolution network.
- MobileNet is nearly as accurate as VGG16 & GoogleNet while being smaller and less computation
- VGG16: 32x smaller & 27x less computation
- GoogleNet: 1.6x smaller & 2.5x less computation
* Reference: Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
728x90
728x90
댓글