Speaker: Francesco Orabona
Speaker Affiliation: Boston University
Date: 2021-4-13
Time: 3:00 pm
Location: On line streaming on YouTube
Abstract
We consider the problem of finding the minimizer of a differentiable function F: R^d -> R using access only to noisy gradients of the function. This is a fundamental problem in stochastic optimization and machine learning. Indeed, a plethora of algorithms have been proposed to solve this problem. However, the choice of the algorithm and its parameters crucially depends on the (unknown) characteristics of the function F. On the other hand, for convex Lipschitz functions it is possible to design parameter-free optimization algorithms that guarantee optimal performance without any hyperparameter to tune. Unfortunately, they do not seem to work on non-convex functions. In an effort to go beyond convex functions, we focus on variationally coherent functions: they are defined by the property that, at any point x, the vector pointing towards the optimal solution x^* and the negative gradient form an acute angle. This class contains convex, quasi-convex, tau-star-convex, and pseudo-convex functions. We propose a new algorithm based on the Follow The Regularized Leader framework with the added twist of using rescaled gradients and time-varying linearithmic regularizers. We can prove almost sure convergence to the global minimizer x^* of variationally coherent functions. Additionally, the very same algorithm with the same hyperparameters, after T iterations guarantees on convex functions that the expected suboptimality gap is bounded by O(||x^*-x_0|| T^{-1/2+eps}) for any eps>0, up to polylog terms. This is the first algorithm to achieve both these properties at the same time. Also, the rate for convex functions essentially matches the performance of parameter-free algorithms.
Bio
Francesco Orabona is an Assistant Professor at Boston University. He received his PhD in Electronic Engineering from the University of Genoa, Italy, in 2007. As a result of his activities, Dr. Orabona has published more than 70 papers in scientific journals and conferences on the topics of online learning, optimization, and statistical learning theory. His latest work is on "parameter-free" machine learning and optimization algorithms, that is algorithms that achieve the best performance without the need to tune parameters, like regularization, learning rate, momentum, etc.