본문 바로가기
논문 리뷰/딥러닝

[MobileNet] Howard et al., 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

by 펄서까투리 2022. 3. 21.

# 세줄 요약 #

  • MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks.
  • We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy.
  • We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification.

 

# 상세 리뷰 #

1. Introduction

  • The general trend has been to make deeper and more complicated networks in order to achieve higher accuracy.
  • However, in many real-world applications such as robotics, self-driving car, and augmented reality, the recognition tasks need to be carried out in a timely fashion on a computationally limited platform.
  • This paper proposes a class of network architectures that allows a model developer to specifically choose a small network that matches the resource restrictions (latency, size) for their application
    • many papers on small networks focus only on size but do not consider speed.

 

2. MobileNet Architecture

  • For MobileNets,
    • the depthwise convolution applies a single filter to each input channel and,
    • the pointwise convolution then applies a 1 x 1 convolution to combine the outputs of the depthwise convolution.

 

 

  • Although the base MobileNet architecture is already small and low latency, many times a specific use case or application may require:
    • the width multiplier: multiplier to input & output layer channels
    • the resolution multiplier: multiplier to input image and internal feature maps

 

3. Experiments

  • We first investigate the effects of depthwise separable convolutions and the trade-offs of reducing the network based on the two hyper-parameters (width multiplier & resolution multiplier).
  • The result of MobileNet with depthwise separable convolutions only reduces accuracy by 1% on Imagenet was saving tremendously on mult-adds and parameters rather than full convolution network.

  • MobileNet is nearly as accurate as VGG16 & GoogleNet while being smaller and less computation
    • VGG16: 32x smaller & 27x less computation
    • GoogleNet: 1.6x smaller & 2.5x less computation

* Reference: Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

 

728x90
728x90

댓글