Statistical analysis of overparameterized neural networks

Tuesday 14.01.2025, 15:45 | SR 2.059, Building 20.30 | Selina Drews | TU Darmstadt

For many years, classical learning theory suggested that neural networks with a large number of parameters would overfit their training data and thus generalize poorly to new, unseen data. Contrary to this long-held belief, the empirical success of such networks has been remarkable. However, from a mathematical perspective, the reasons behind their performance are not fully understood.

In this talk, we consider overparameterized neural networks learned by gradient descent ina statistical setting. We show that an estimator based on an overparameterized neural network - trained with a suitable step size and for an appropriate number of gradient descent steps - can be universally consistent. Furthermore, under suitable smoothness assumptions on the regression function, we derive rates of convergence for this  estimator. These results provide new insights into why overparameterized neural networks can generalize effectively despite their high complexity.