# SuooL's Blog

## Introduction

mixup 的实现还有另一种方式，对 feature 进行插值得到新样本，但对targets不进行插值计算，而是通过修改 loss 来实现：

mixup 的数据增强方法以 batch 为单位进行操作，可以是同一个batch数据内部， 也可以在不同 batch 数据之间。

## Experiments

### 预实验

beta分布对超参数 $\alpha$的作用:

The mixup hyper-parameter $\alpha$ controls the strength of interpolation between feature-target pairs, recovering the ERM principle as $\alpha \to 0$.

The mixup vicinal distribution can be understood as a form of data augmentation that encourages the model $f$ to behave linearly in-between training examples. We argue that this linear behaviour reduces the amount of undesirable oscillations when predicting outside the training examples.

### 图像分类表现

ImageNet-2012的分类实验发现：

we find that α ∈ [0.1, 0.4] leads to improved performance over ERM, whereas for large α, mixup leads to underfitting. We also find that models with higher capacities and/or longer training runs are the ones to benefit the most from mixup.

### SPEECH DATA 实验表现

For speech data, it is reasonable to apply mixup both at the waveform and spectrogram levels. Here, we apply mixup at the spectrogram level just before feeding the data to the network.

### MEMORIZATION OF CORRUPTED LABELS 实验

Generate three CIFAR-10 training sets, where 20%, 50%, or 80% of the labels are replaced by random noise, respectively. All the test labels are kept intact for evaluation

To quantify the amount of memorization, we also evaluate the training errors at the last epoch on real labels and corrupted labels

To quantify the amount of memorization, we also evaluate the training errors at the last epoch on real labels and corrupted labels

### ROBUSTNESS TO ADVERSARIAL EXAMPLES 实验

mixup 对于模型鲁棒性的影响实验，其中对抗性样本的产生：

Adversarial examples are obtained by adding tiny (visually imperceptible) perturbations to legitimate examples in order to deteriorate the performance of the model. The adversarial noise is generated by ascending the gradient of the loss surface with respect to the legitimate example.

for each of the two models, we use the model itself to generate adversarial examples, either using the Fast Gradient Sign Method (FGSM) or the Iterative FGSM (I-FGSM) methods (Goodfellow et al., 2015), allowing a maximum perturbation of $\epsilon = 4$ for every pixel. For I-FGSM, we use 10 iterations with equal step size

we use the first ERM model to produce adversarial examples using FGSM and I-FGSM. Then, we test the robustness of the second ERM model and the mixup model to these examples

### STABILIZATION OF GANs 实验

We argue that mixup should stabilize GAN training because it acts as a regularizer on the gradients of the discriminator.
Then, the smoothness of the discriminator guarantees a stable source of gradient information to the generator.

### ABLATION STUDIES 实验

mixup 是直接进行样本（输入和输出）插值的数据增强方法，但是在增强输入的时候，可以选择对神经网络的潜在表示（即feature maps）进行插值，也可以选择只在最近邻之间进行插值，或者只在同一类的输入之间进行插值。当要插值的输入来自两个不同的类时，我们可以选择为合成输入指定一个标签，例如使用在凸组合中权重更大的输入标签。消融实验主要在这些可能性的选择中进行。

• mixup 最优，且明显优于次优方案 mix input + label smoothing

• 正则化的影响如上表所示，REM 较大的权重衰减合适，mixup相反，表明mixup本身可以为模型带来一定的正则化影响，提高模型的泛化性和鲁棒性

• 在高层次特征表示进行插值时使用大的权重衰减有益处，表明增加了正则化影响

• 插值方法的对比：Among all the input interpolation methods, mixing random pairs from all classes (AC + RP) has the strongest regularization effect.Label smoothing and adding Gaussian noise have a relatively small regularization effect.

• SMOTE方法的影响：Finally, we note that the SMOTE algorithm does not lead to a noticeable gain in performance.

## Conclusion

With increasingly large $α$, the training error on real data increases, while the generalization gap decreases. This sustains our hypothesis that mixup implicitly controls model complexity. However, we do not yet have a good theory for understanding the ‘sweet spot’ of this bias-variance trade-off.