EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[Efficient Net 논문]
** 아래의 내용은 위의 논문에서 사용되는 사진과 제가 재해석한 내용입니다.
[코드]
github.com/tensorflow/tpu/tree/master/models/official/efficientnet
예시)
Top 4 Pre-Trained Models for Image Classification with Python Code
Abstract
관찰을 바탕으로, 단순하지만 매우 효과적인 compound coefficient를 사용하여 depth(깊이)/width(너비)/resolution(해상도)의 모든 dimension을 균일하게 scale..하는 새로운 scaling 방법을 제안한다.
Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient.
MobileNets과 ResNet을 scaling up 할 때 이 방법의 효과를 볼 수 있다.
We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet.
1. Introduction
ConvNets(Convolution Networks)를 scaling up(확장)하는 과정을 연구하고 재고하고자 한다.
In this paper, we want to study and rethink the process of scaling up ConvNets.
우리(Google Research, Brain Team)의 경험적 연구에 따르면 network width/depth/resolution의 모든 dimensions의 균형을 맞추는 것이 중요하며, 놀랍게도 이러한 균형은 단순히 일정한 비율로 각 크기를 조절하여 달성할 수 있다.
Our empirical study shows that it is critical to balance all dimensions of network width/depth/resolution, and surprisingly such balance can be achieved by simply scaling each of them with constant ratio.
그러나 우리가 아는 한 우리는 network width/depth/resolution의 모든 dimensions간의 관계를 경험적(?)으로 정량화한 최초의 기업이다.
but to our best knowledge, we are the first to empirically quantify the relationship among all three dimensions of network width, depth, and resolution.
Figure 1은 EfficientNets이 다른 ConvNets을 훨씬 능가하는 ImageNet 성능을 요약한 것이다.
Figure 1 summarizes the ImageNet performance, where our EfficientNets significantly outperform other ConvNets.
2. Related Work
ConvNet Accuracy:
많은 application에서 더 높은 정확도가 중요하지만 이미 HW 메모리 한계에 도달했기때문에 더 높은 accuracy를 얻으려면 더 나은 efficiency이 필요하다.
Although higher accuracy is critical for many applications, we have already hit the hardware memory limit, and thus further accuracy gain needs better efficiency.
ConvNet Efficiency:
이 논문에서는 state-of-the-art accuracy를 능가하는 super large ConvNets model efficiency를 연구하는 것이 목표다. 이 목표를 달성하기 위해 모델 scaling을 이용한다.
In this paper, we aim to study model efficiency for super large ConvNets that surpass state-of-the-art accuracy. To achieve this goal, we resort to model scaling.
Model Scaling:
network width, depth, and resolutions의 3가지 dimensions 모두에 대해 ConvNet scaling을 체계적이고 경험적으로 연구한다.
Our work systematically and empirically studies ConvNet scaling for all three dimensions of network width, depth, and resolutions.
3. Compound Model Scaling
3.1. Problem Formulation
ConvNet layers는 (모든 Conv가 그런 것은 아니지만) 여러 단계로 분할되며 각 단계의 모든 layers는 동일한 architecture를 공유한다.
(의역)
ConvNet layers are often partitioned into multiple stages and all layers in each stage share the same architecture: ex) ResNet
Therefore, we can define a ConvNet as:
$\hat{F_{1}}$ : layer architecture/baseline network
$\hat{L_{1}} , \hat{W_{1}}$ : the network length, width
$\hat{H_{1}} , \hat{C_{1}}$ : resolution
보통 ConvNet designs에서 가장 좋은 architecture를 찾는 것에 집중했던 것과 달리, model scaling은 미리 정의한 baseline network 변경 없이 network length, width, and/or resolution을 확장하는 것을 시도한다.
Unlike regular ConvNet designs that mostly focus on finding the best layer architecture $\hat{F_{1}}$, model scaling tries to expand the network length, width, and/or resolution without changing $\hat{F_{1}}$ predefined in the baseline network.
우리의 목표는 주어진 resource의 제약에 따라 모델 정확도를 최대화하는 것이다.
Our target is to maximize the model accuracy for any given resource constraints, which can be formulated as an optimization problem:
where **w, d, r are coefficients for scaling network width, depth, and resolution; $\hat{F_{1}}, \hat{L_{1}}, \hat{H_{1}}, \hat{W_{1}}, \hat{C_{1}}$ are predefined parameters in baseline network (see Table 1 as an example)
3.2. Scaling Dimensions
Depth (d): Scaling network depth is the most common way used by many ConvNets.
The intuition is that deeper ConvNet can capture richer and more complex features, and
generalize well on new tasks. However, deeper networks are also more difficult to train due to the vanishing gradient problem. (Although several techniques, such as skip connections and batch normalization.)
Width (w): Scaling network width is commonly used for small size models.
wider networks tend to be able to capture more fine-grained features and are easier to train. However, extremely wide but shallow networks tend to have difficulties in capturing higher level features.
Resolution (r): With higher resolution input images, ConvNets can potentially capture more fine-grained patterns.
the results of scaling network resolutions, where indeed higher resolutions improve accuracy, but the accuracy gain diminishes for very high resolutions (r = 1.0 denotes resolution 224x224 and r = 2.5 denotes resolution 560x560)
관찰 1 - network width, depth, or resolution의 dimension 중 하나라도 scaling up하면 accuracy가 향상되지만 bigger models에서는 accuracy 향상을 기대하기 어렵다.
(의역)
Observation 1 – Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.
3.3. Compound Scaling
We empirically observe that different scaling dimensions are not independent.
관찰 2 - 더 나은 accuracy와 efficiency를 위해, ConvNet scaling할 때 network width, depth, and resolution의 모든 dimensions의 밸런스가 중요하다.
Observation 2 – In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling.
In this paper, we propose a new compound scaling method, which use a compound coefficient φ to uniformly scales network width, depth, and resolution in a principled way:
where α, β, γ are constants that can be determined by a small grid search.
4. EfficientNet Architecture
모델 스케일링은 baseline network의 layer operators를 변경하지 않기때문에, 좋은 baseline network를 사용하는 것이 중요하다.
Since model scaling does not change layer operators $\hat{F_{1}}$ in baseline network, having a good baseline network is also critical.
우리의 scaling 방법으로 이미지 존재하는 ConvNets를 이용할 것이지만 우리의 scaling 방법의 effectiveness의 더 나은 증명을 위해, 새로운 mobile-size baseline인 EfficientNet 또한 개발했다.
We will evaluate our scaling method using existing ConvNets, but in order to better demonstrate the effectiveness of our scaling method, we have also developed a new mobile-size baseline, called EfficientNet.
Our search produces an efficient network, which we name EfficientNet-B. Since we use the same search space as (Tan et al., 2019), the architecture is similar to MnasNet, except our EfficientNet-B0 is slightly bigger due to the larger FLOPS target (our FLOPS target is 400M).
** MnasNet: Platform-Aware Neural Architecture Search for Mobile
https://arxiv.org/pdf/1807.11626.pdf
baseline EfficientNet-B0을 시작하며, 우리의 compound scaling method를 scale up하는 2가지 step을 제안한다.
Starting from the baseline EfficientNet-B0, we apply our compound scaling method to scale it up with two steps:
- STEP 1: we first fix φ = 1, assuming twice more resources available, and do a small grid search of α, β, γ based on Equation 2 and 3. In particular, we find the best values for EfficientNet-B0 are α = 1.2, β = 1.1, γ = 1.15, under constraint of α · β 2· γ 2 ≈ 2.
- STEP 2: we then fix α, β, γ as constants and scale up baseline network with different φ using Equation 3, to obtain EfficientNet-B1 to B7 (Details in Table 2).
우리의 방법은 작은 규모의 baseline network에서 한번만 search(a small grid search of α, β, γ)를 한 다음(1단계) 한 다음 다른 모든 모델(different compound coefficient φ)에 대해 동일한 scaling coefficients를 사용하여(2단계) 이 문제를 해결한다.
Our method solves this issue by only doing search once on the small baseline network (step 1), and then use the same scaling coefficients for all other models (step 2).
5. Experiments
** 논문 참고
6. Discussion
일반적으로, 모든 scaling 방법은 accuracy가 향상된다. 하지만 우리의 compound scaling 방법은 다른 single-dimension scaling 방법보다 더욱 더 accuracy가 향상시킬 수 있다.
In general, all scaling methods improve accuracy with the cost of more FLOPS, but our compound scaling method can further improve accuracy, by up to 2.5%, than other single-dimension scaling methods, suggesting the importance of our proposed compound scaling.
위의 그림을 보면, compound scaling 모델이 더 개체의 details한 relevant regions에 더 focusing하는 경향이 있다.
As shown in the figure, the model with compound scaling tends to focus on more relevant regions with more object details.
7. Conclusion
ConvNet scaling을 체계적으로 연구하고, network width, depth을 주의 깊게 조정하는 것과 중요하지만 놓치고 있던 resolution(해상도)를 주의 깊게 조정하지 않는 것이 정확성과 효율성을 향상을 방해한다는 것을 확인했다.
(의역)
In this paper, we systematically study ConvNet scaling and identify that carefully balancing network width, depth, and resolution is an important but missing piece, preventing us from better accuracy and efficiency.
compound scaling 방법으로 구동(?)되는 mobile-size EfficientNet 모델이 매우 효과적으로 scaled up될 수 있다는 것을 증명한다(?)
Powered by this compound scaling method, we demonstrate that a mobile-size EfficientNet model can be scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on both ImageNet and five commonly used transfer learning datasets.
나의 결론
- 장점: compound scaling(width, depth, resolution) 방법을 경험적으로 식을 구현(formulated)할 수 있게 되었다는 점에서 큰 도약(?)임. classification 시 object를 더 잘 인식하여 분류할 수 있을 것으로 기대됨
- 단점: 확장성에 대한 의문..? compound scaling을 여러 ConvNet에 대한 테스트 해봐야할 듯 (모든 layers가 동일한 architecture를 공유하는 ConvNets에 한정된다)
** 첨언 및 조언 환영합니다!