Normalization

Normalize weights:

weight normalization [1]: $\mathbf{w}=\frac{g}{|\mathbf{v}|} \mathbf{v}$, weight normalization can be viewed as a cheaper and less noisy approximation to batch normalization

Normalize outputs:

batch normalization [2]: make the input and output have the same variance
layer normalization [3]
instance normalization [4]
group normalization [5]

N as the batch axis, C as the channel axis, and (H, W)
as the spatial axes

[1] Salimans T, Kingma D P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks[C]//Advances in Neural Information Processing Systems. 2016: 901-909.

[2] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

[3] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607.06450, 2016.

[4] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022, 2016.

[5] Wu Y, He K. Group normalization[J]. arXiv preprint arXiv:1803.08494, 2018.