Previous research has shown that fully-connected neural networks with
small initialization and gradient-based training methods exhibit a phenomenon known
as condensation [T. Luo et al., J. Mach. Learn. Res., 22(1), 2021]. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in the non-linear learning process
that enables neural networks to possess better generalization abilities. However, the
impact of neural network architecture on this phenomenon remains a topic of inquiry.
In this study, we turn our focus towards convolutional neural networks (CNNs) to investigate how their structural characteristics, in contrast to fully-connected networks,
exert influence on the condensation phenomenon. We first demonstrate in theory that
under gradient descent and the small initialization scheme, the convolutional kernels
of a two-layer CNN condense towards a specific direction determined by the training
samples within a given time period. Subsequently, we conduct systematic empirical
investigations to substantiate our theory. Moreover, our empirical study showcases
the persistence of condensation under broader conditions than those imposed in our
theory. These insights collectively contribute to advancing our comprehension of the
non-linear training dynamics inherent in CNNs.