In this first post, we will take things slow and set things up for subsequent discussions, first for the simple case of linear networks; then more practical architectures including ReLU layers, convolutional layers and a certain ResNet variant for subsequent posts.
In this second post of the series, we go through the one-line proof of a certain training invariance in linear neural networks and its consequences, in term of alignment, weight matrix rank, and margin.