Variable screening for sparse encoding of deep neural networks.
Sparsity inducing penalization in artificial neural networks (ANNs) helps counter over-fitting, especially in situations where the size of the training data is small when compared to the number of features. Traditionally, one estimates the magnitude of the penalty using cross-validation. However, this approach is undesirable when (1) the computational cost for fitting the model is large or (2) the size of the dataset is small. This talk presents a theoretical framework to an alternative to cross-validation which addresses the above criticism that is based on the desiderata that the estimate be zero with large probability under the null hypothesis of no signal. We show that the distribution of the size of the penalty is related to the distribution of the sup-norm of the gradient of the loss function at zero under the zero signal assumption. This approach generalizes the universal threshold of Donoho and Johnstone (1994) to nonlinear ANN learning.
Preliminary simulation results shows that the method has good practical performance for variable screening. That is, using this penalty size one recovers the important features with high probability, in suitable regimes of sample size, feature size and model complexity.