Mean field analysis of neural networks: typical behavior and fluctuations
Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in finance, engineering, robotics, and medicine. Despite their immense success in practice, there is limited mathematical understanding of neural networks. Our work shows how neural networks can be studied via stochastic analysis, and develops approaches for addressing some of the technical challenges which arise. We analyze both multi-layer and one-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. In the case of single layer neural networks, we rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. In addition, we rigorously prove a central limit theorem, which describes the neural network's fluctuations around its mean- field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. For multilayer neural networks we rigorously derive the limiting behavior of the neural networks output. We also prove convergence to the global minimum under appropriate conditions. We demonstrate the theoretical results in the study of the evolution of parameters in the well known MNIST and CIFAR10 data sets.