LCSL logo
MaLGa logo

MaLGa Seminar Series

We are involved in the organization of the MaLGa Seminar Series, in particular those on Statistical Learning and Optimization. The MaLGa seminars are divided in four main threads, including Statistical Learning and Optimization as well as Analysis and Learning, Machine Learning and Vision, Machine Learning for Data Science.

An up-to-date list of ongoing seminars is available on the MaLGa webpage.

Seminars will be streamed on our YouTube channel.

Low Complexity Regularization of Inverse Problems: from Sensitivity Analysis to Algorithms

Speaker: Jalal Fadili
Speaker Affiliation: École nationale supérieure d'ingénieurs de Caen
Host: Lorenzo Rosasco and Guillaume Garrigos
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-03-21
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Inverse problems and regularization theory is a central theme in imaging sciences, statistics and machine learning. The goal is to reconstruct an unknown vector from partial indirect, and possibly noisy, measurements of it. A now standard method for recovering the unknown vector is to solve a convex optimization problem that enforces some prior knowledge about its structure. This talk delivers some results in the eld where the regularization prior promotes solutions conforming to some notion of simplicity/low-complexity. These priors encompass as popular examples sparsity and group sparsity, total variation and low-rank. Our aim is to provide a unified treatment of all these regularizations under a single umbrella, namely the theory of partial smoothness. This framework is very general and accommodates all low-complexity regularizers just mentioned, as well as many others. Partial smoothness turns out to be the canonical way to encode low-dimensional models that can be linear spaces or more general smooth manifolds. This review is intended to serve as a one stop shop toward the understanding of the theoretical properties of the so-regularized solutions. It covers a large spectrum including: (i) recovery guarantees and stability to noise, both in terms of Lipschitz-stability and model (manifold) identification; (ii) sensitivity analysis to perturbations of the parameters involved (in particular the observations); (iii) convergence properties of forward-backward type proximal splitting, that is particularly well suited to solve the corresponding large-scale regularized optimization problem.

Optimistic exploration in non-stochastic bandit problems

Speaker: Gergely Neu
Speaker Affiliation: AI group, DTIC, Universitat Pompeu Fabra
Host: Lorenzo Rosasco and Enrico Cecini
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-03-14
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
In stochastic bandit problems, the principle of 'optimism in the face of uncertainty' has proven to be an essential tool for designing efficient exploration policies. Despite this success, the notion of optimism has been relatively unexplored in the world of non-stochastic (or adversarial) bandits. In this talk, I describe an optimistic exploration technique called 'implicit exploration' for non-stochastic multi-armed bandits that leads to a family of learning algorithms with improved empirical performance and theoretical guarantees. For the first time, these results suggest that a certain degree of optimism can be very useful even in adversarial domains.

From error bounds to the complexity of first-order descent methods for convex functions

Speaker: Jérôme Bolte
Speaker Affiliation: Toulouse School of Economics
Host: Lorenzo Rosasco and Saverio Salzo
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-03-02
Time: 3:00 pm
Location: Room 706, DIMA Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
We will show that error bounds can be turned into effective tools for deriving complexity results of first order methods for convex minimization. This led us to study the interplay between error bounds and KL inequality. We showed the equivalence between the two concepts for functions having a profile moderately flat near the minimizers set (as those of functions with Holderian growth). A counterexample show the relevance of our approach since the equivalence is no longer true for extremely flat functions. In a second stage, we show how KL inequalities can in turn be used to compute new complexity bounds for a wealth of descent methods. Our method is completely original and makes use of a one dimensional worst case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a very wide class of descent methods. Our approach inaugurates a simple methodology: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one dimensional worst case proximal sequence. Our method is illustrated through the famous iterative thresholding algorithm, also known as ISTA, for which we show that the complexity bound is of the form O(q^2k) where the constituents of the bound only depend on error bounds constants obtained for the usual objective. This talk is based on joint work with T.P Nguyen, J. Peypouquet and B. Suter.

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

Speaker: Aymeric Dieuleveut
Speaker Affiliation: École Normale Supérieure
Host: Lorenzo Rosasco and Enrico Cecini
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-02-28
Time: 3:00 pm
Location: Room 706, DIMA Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the 'optimal' terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance.

Improved Classification Rates for Localized Algorithms under Margin Conditions

Speaker: Ingrid Blaschzyk
Speaker Affiliation: Institut für Stochastik und Anwendungen, Universität Stuttgart
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-02-21
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Handling large-scale datasets is challenging for kernel-based learning methods. To address this problem many methods have been presented in recent years, for example, Nyström method, random fourier features, random chunking or localized support vector machines (SVMs). In this talk we present an oracle inequality and learning rates for localized SVMs using the hinge loss and show that the resulting rates match those known for global SVMs. Furthermore, we present a simple partitioning based technique in order to refine the statistical analysis of classification algorithms. The core of our idea is to divide the input space into two parts such that the first part contains a suitable vicinity around the decision boundary, while the second part is sufficiently far away from the decision boundary. Using a set of margin conditions we are then able to control the classification error on both parts separately. By balancing out these two error terms we obtain a refined error analysis in a final step. We apply this general idea to the histogram rule and present learning rates. Even for this simple method it turns out that we obtain under certain assumptions better rates than the ones known for global SVMs, for certain plug-in classifiers, and for a recently analysed tree based adaptive-partitioning ansatz. Finally, we discuss work in progress: we use the technique described above to refine the approximation error for localized SVMs.

Data driven estimation of the Laplace-Beltrami operator

Speaker: Ilaria Giulini
Speaker Affiliation: INRIA
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-02-15
Time: 3:00 pm
Location: DIMA, Università di Genova. Via Dodecaneso 35, Genova, IT.

Abstract
Approximations of the Laplace-Beltrami operator on manifolds through graph Laplacians have become popular tools in data analysis and machine learning. These discretized operators usually depend on bandwidth parameters whose tuning remains a theoretical and practical problem. We address this problem for the unnormalized graph Laplacian by establishing an oracle inequality that opens the door to a well-founded data-driven procedure for the bandwidth selection. Our approach relies on recent results by Lacour, Massart, Rivoirard (2016) on the so called Lepski method.

Machine Learning and data-based Physics

Speaker: Marco Zanetti
Speaker Affiliation: Università di Padova
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-02-13
Time: 3:00 pm
Location: Hack conference room, IIT, Via morego 30, Genova, IT.

Abstract
After decades of thorough experimental explorations of more and more sophisticated phenomena, Physics research entered an era where low hanging fruits are no more there. Mathematical models have been developed on the basis of few guiding principles (e.g. symmetries); their accuracy in describing the experimental data is often astonishing. Still several things out there are currently not framed in those models, like the so called Dark Matter of the force of Gravity itself. This is also due to the fact that what experiments can probe is nowadays 'standard', whereas 'new' phenomena are expected to be extremely rare. As a consequence most of the modern physics experiments collect gigantic datasets: as an example, the Large Hadron Collider at CERN produces steadily 40 million proton-proton collisions per second, each corresponding to about 2 MB of information. The simulated dataset used to compare the collision data with the expectations are typically much larger. Analogously, datasets from astronomical surveys are comparable in size. The analysis of those data is extremely challenging: so far the approach has been 'model-driven', i.e. a variety of theoretical models get tested against the data. Advanced Machine Learning techniques are however playing a more and more crucial role, to enhance the sensitivity to interesting phenomena and with the ultimate goal of exploiting the data themselves to indicate the most appropriate Physics model. In this seminar I'll describe the issues related to the analysis of the physics datasets and indicate possible mutual benefits between the Physics and Machine Learning research fields.

From Monge-Kantorovich to Gromov-Wasserstein: Numerical Optimal Transport Between Several Metric Spaces

Speaker: Gabriel Peyré
Speaker Affiliation: Département de Mathématiques et Applications, École Normale Supérieure; Mokaplan INRIA/CNRS/Paris-Dauphine research group
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2017-01-17
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Optimal transport (OT) has become a fundamental mathematical theoretical tool at the interface between calculus of variations, partial differential equations and probability. It took however much more time for this notion to become mainstream in numerical applications. This situation is in large part due to the high computational cost of the underlying optimization problems. There is however a recent wave of activity on the use of OT-related methods in fields as diverse as computer vision, computer graphics, statistical inference, machine learning and image processing. In this talk, I will review an emerging class of numerical approaches for the approximate resolution of OT-based optimization problems. These methods make use of an entropic regularization of the functionals to be minimized, in order to unleash the power of optimization algorithms based on Bregman-divergences geometry. This results in fast, simple and highly parallelizable algorithms, in sharp contrast with traditional solvers based on the geometry of linear programming. For instance, they allow for the first time to compute barycenters (according to OT distances) of probability distributions discretized on computational 2-D and 3-D grids with millions of points. This offers a new perspective for the application of OT in machine learning (to perform clustering or classification of bag-of-features data representations) and imaging sciences (to perform color transfer or shape and texture morphing). These algorithms also enable the computation of gradient flows for the OT metric, and can thus for instance be applied to simulate crowd motions with congestion constraints. We will also discus various extensions of classical OT, such as handling unbalanced transportation between arbitrary positive measures (the so-called Hellinger-Kantorovich/Wasserstein-Fisher-Rao problem), and the computation of OT between different metric spaces (the so-called Gromov-Wasserstein problem). This is a joint work with M. Cuturi and J. Solomon.

Graphons, mergeons, and so on!

Speaker: Justin Eldridge
Speaker Affiliation: Ohio State University
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2016-12-13
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
A fundamental problem in the theory of graph clustering is that of defining a cluster. Given a graph, what is the correct clustering? There is no single answer to this seemingly simple question. However, in a statistical setting in which the graphs to be clustered come from some underlying probability distribution, it is natural to define the correct clusters in terms of the distribution itself. The goal of clustering, then, is to identify the appropriate cluster structure of the distribution, and to recover that structure from a finite sample. In this talk, I will develop a statistical theory of graph clustering for the setting in which graphs are drawn from a so-called 'graphon'; a very general and powerful random graph model of much recent interest. First, I will define the clusters of a graphon. The natural definition results in a graphon having a hierarchy of clusters, which we call the 'graphon cluster tree'. Next, I develop a notion of statistical consistency for estimators of the graphon cluster tree, in which a clustering method is said to be consistent if its output converges to the graphon cluster tree in a suitable metric. Finally, I will identify a specific, practical algorithm which consistently recovers the cluster tree.

Convergence Analysis of a Stochastic Majorize-Minimize Subspace Algorithm

Speaker: Jean-Christophe Pesquet
Speaker Affiliation: Université Paris-Est, Labex Bezout and LIGM (Laboratoire d` Informatique Gaspard Monge)
Host: Lorenzo Rosasco and Enrico Cecini
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2016-05-12
Time: 3:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of an objective function being the sum of a data fidelity term and a penalization (e.g. a sparsity promoting function), Majorize-Minimize (MM) methods have recently attracted much interest since they are fast, highly flexible, and effective in ensuring convergence. The goal of this talk is to show how these methods can be successfully extended to the case when the data fidelity term corresponds to a least squares criterion and the cost function is replaced by a sequence of stochastic approximations of it. An online version of the MM subspace algorithm, along with its convergence properties, will be presented. In particular, convergence rates are derived enlightening the influence of the choice of the subspace. Simulation results illustrate the good practical performance of the proposed algorithm associated with a memory gradient subspace, when applied to both non-adaptive and adaptive filter identification problems. (joint work with Emilie CHOUZENOUX)

Regularized Wasserstein Distances and Applications

Speaker: Marco Cuturi
Speaker Affiliation: Yamamoto Cuturi Lab, Graduate School of Informatics, Kyoto University
Host: Lorenzo Rosasco and Maximilian Nickel
Host Affiliation:Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2015-12-04
Time: 2:30 pm
Location: MIT 46-3189 (BCS, McGovern Seminar Room)

Abstract
Optimal transport distances (a.k.a Wasserstein distances or Earth Mover's distances, EMD) define a geometry for empirical measures supported on a metric space. After reviewing the basics of the optimal transport problem, I will show how an adequate regularization of that problem can result in substantially faster computations. I will then show how this regularization can enable several applications of optimal transport in machine learning, including some in parameter inference within the framework of minimum Wasserstein distance estimators.

Bio
Marco Cuturi received his Ph.D. in applied maths from the Ecole des Mines de Paris under the supervision ofJean-Philippe Vert. After working at the ORFE department of Princeton University between 02/2009 and 08/2010 as a lecturer, he joined the Graduate School of Informatics in 09/2010 as a G30 associate professor. He is currently the associate professor of the Yamamoto-Cuturi lab, starting from 11/2013.

Speech production features for deep neural network acoustic modeling

Speaker: Leonardo Badino
Speaker Affiliation: Robotics Brain and Cognitive Science - Istituto Italiano di Tecnologia
Host: Lorenzo Rosasco, Georgios Evangelopoulos
Host Affiliation:Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2015-11-10
Time: 1:00 pm
Location: 32-G449 (Stata Center-Kiva Conference Room), MIT

Abstract
In the last few years DNNs have become the dominant technique for acoustic modeling in automatic speech recognition (ASR). The diverse set of approaches proposed to further improve ASR performance includes DNN- based acoustic modeling that uses speech production knowledge (SPK), i.e., information about how the vocal tract produces speech sounds. While standard acoustic modeling already relies on some phonological SPK binary features (e.g., fricative) to model phonetic context and define the DNN targets, more explicit uses of SPK for DNN acoustic model training can be explored. In this talk I will be presenting two SPK-based approaches. The first approach relies on measurements of vocal tract movements to extract new acoustic features that are appended to the DNN input vector. The second approach extracts continuous valued SPK features from binary phonological features which are then used to build a structured output for the DNN. The two approaches, tested on mngu0 and TIMIT datasets, show a consistent phone recognition error reduction over a baseline that does not use SPK.

Bio
Leonardo Badino is a postdoc researcher at the Robotics Brain and Cognitive Sciences Department of the Italian Institute of Technology (IIT). He received a PhD in Computer Science from the University of Edinburgh (2006-2010). Before moving to Edinburgh he worked as software engineer/ researcher / project manager in Loquendo, a speech technology company (2001-2006). He received a 5-year degree (BEng + MEng) in Electronic Engineering from the Universita’ di Genova (1994-2000). His research interests include text-to-speech (TTS) synthesis, automatic speech recognition (ASR), machine learning for speech and language processing, and analysis of non-verbal communication.

New Approaches to Learn with Probability Measures using Fast Optimal Transport

Speaker: Marco Cuturi
Speaker Affiliation: Yamamoto Cuturi Lab, Graduate School of Informatics, Kyoto University
Host: Lorenzo Rosasco and Alessandro Rudi
Host Affiliation:Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2015-07-30
Time: 3:30 pm
Location: DIBRIS - Conference Hall, III floor, via Dodecaneso 35, Genova, IT.

Abstract
Optimal transport distances (a.k.a Wasserstein distances or Earth Mover's distances, EMD) define a geometry for empirical measures supported on a metric space. After reviewing the basics of the optimal transport problem, I will show how an adequate regularization of that problem can result in substantially faster computations. I will then show how this regularization can enable several applications of optimal transport to learn from probability measures, from the computation of barycenters to that of dictionaries, PCA, or parameter estimation, all carried out using the Wasserstein geometry.

Bio
Marco Cuturi received his Ph.D. in applied maths from the Ecole des Mines de Paris under the supervision ofJean-Philippe Vert. After working at the ORFE department of Princeton University between 02/2009 and 08/2010 as a lecturer, he joined the Graduate School of Informatics in 09/2010 as a G30 associate professor. He is currently the associate professor of the Yamamoto-Cuturi lab, starting from 11/2013.

Geometric Methods for the Approximation of High-dimensional Dynamical Systems

Speaker: Mauro Maggioni
Speaker Affiliation: Department of Mathematics, Computer Science, and Electrical and Computer Engineering, Duke University
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2015-07-17
Time: 11:30 am
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
We discuss a geometry-based statistical learning framework for performing model reduction and modeling of stochastic high-dimensional dynamical systems. We consider two complementary settings. In the first one, we are given long trajectories of a system, e.g. from molecular dynamics, and we discuss new techniques for estimating, in a robust fashion, an effective number of degrees of freedom of the system, which may vary in the state space of then system, and a local scale where the dynamics is well-approximated by a reduced dynamics with a small number of degrees of freedom. We then use these ideas to produce an approximation to the generator of the system and obtain, via eigenfunctions, reaction coordinates for the system that capture the large time behavior of the dynamics. We present various examples from molecular dynamics illustrating these ideas. In the second setting we only have access to a (large number of expensive) simulators that can return short simulations of high-dimensional stochastic system, and introduce a novel statistical learning framework for learning automatically a family of local approximations to the system, that can be (automatically) pieced together to form a fast global reduced model for the system, called ATLAS. ATLAS is guaranteed to be accurate (in the sense of producing stochastic paths whose distribution is close to that of paths generated by the original system) not only at small time scales, but also at large time scales, under suitable assumptions on the dynamics. We discuss applications to homogenization of rough diffusions in low and high dimensions, as well as relatively simple systems with separations of time scales, and deterministic chaotic systems in high-dimensions, that are well-approximated by stochastic differential equations.

Bio
Mauro Maggioni is Professor of Mathematics at Department of Mathematics, Electrical and Computer Engineering, Computer Science, Duke University. His research interests focus on applied harmonic analysis; diffusion processes and heat kernels; theory and algorithms for machine learning; spectral graph theory. His recent work has focused on the construction of multi-resolution structures on discrete data and graphs, connecting aspects of classical harmonic analysis, global analysis on manifolds, spectral graph theory and classical multiscale analysis.

Learning to Shape Human-Robot Interactions: Models and Algorithms

Speaker: Subramanian Ramamoorthy
Speaker Affiliation: School of Informatics, The University of Edinburgh
Host: Lorenzo Rosasco, Armando Tacchella
Host Affiliation:DIBRIS, Universita' degli studi di Genova.

Date: 2015-06-25
Time: 14:30
Location: Conference Room 363bis, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
We are motivated by the problem of building interactively intelligent robots. One attribute of such an autonomous system is the ability to make predictions about the actions and intentions of other agents in a dynamic environment, and to adapt its own decisions accordingly. This kind of ability is especially important when robots are called upon to work closely together with human users and operators. I will begin my talk by briefly describing some robotic systems we have built that exhibit this ability. This includes mobile robots that can navigate in crowded spaces and humanoid robots that can cooperate with human co-workers. Underpinning such systems are a variety of algorithmic tools for behaviour prediction, categorization and decision-making. I will present three recent results from my group’s work in this area. Firstly, we will look at the problem of adaptation of an interface to a diverse population of users with varying levels of skill and other personal traits. I will outline a latent variable model and a Bayesian algorithm for selecting action sets that constitute a best response to the agent’s belief about the user profile. I will report on experiments with this model involving both simulated and human users, showing that our adaptive solution outperforms alternate static solutions and adaptive baselines such as EXP-3. Next, I will outline a model for ad hoc multi-agent interaction without prior coordination, which extends the above insights to an explicitly strategic setting. By conceptualizing the interaction as a stochastic Bayesian game, the choice problem is formulated in terms of types in an incomplete information game, allowing for a learning algorithm that combines the benefits of Harsanyi’s notion of types and Bellman’s notion of optimality in sequential decisions. These theoretical arguments will be supported by some preliminary results from experiments involving human-machine interaction, such as in prisoner’s dilemma, where we show a better rate of coordination than alternate multi-agent learning algorithms. Where do these behavioural types come from? One explanation is that decision processes admit categorization in terms of behavioural equivalence. I will conclude by discussing our current work on categorizing decision processes in terms of their behavioural equivalence, in the form of an algorithm for clustering Markov Decision Processes with a view to enabling transfer and policy reuse. This is a step towards answering the question of why we expect there to be compact libraries of types that are exploited by techniques such as those mentioned above.

Bio
Dr. Subramanian Ramamoorthy is a Reader (Associate Professor) in Robotics at the School of Informatics, University of Edinburgh, where he has been since 2007. He is the Coordinator of the EPSRC Robotarium Research Facility, and Executive Committee Member for the Centre for Doctoral Training in Robotics and Autonomous Systems. Previously, he received a PhD in Electrical and Computer Engineering from The University of Texas at Austin. He is an elected Member of the Young Academy of Scotland at the Royal Society of Edinburgh. His current research is focussed on problems of autonomous learning and decision-making under uncertainty, by long-lived agents and agent teams interacting within dynamic environments. This work is motivated by applications in autonomous robotics, human-robot interaction, intelligent interfaces and other autonomous agents in mixed human-machine environments. These problems are solved using a combination of methods involving layered representations based on geometric/topological abstractions, game theoretic and behavioural models of inter-dependent decision making, and machine learning with emphasis on issues of transfer, online and reinforcement learning. His work has been recognised by nominations for Best Paper Awards at major international conferences - ICRA 2008, IROS 2010, ICDL 2012 and EACL 2014. He serves in editorial and programme committee roles for conferences and journals in the areas of AI and Robotics. He leads Team Edinferno, the first UK entry in the Standard Platform League at the RoboCup International Competition. This work has received media coverage, including by BBC News and The Telegraph, and has resulted in many public engagement activities, such as at the Royal Society Summer Science Exhibition, Edinburgh International Science festival and Edinburgh Festival Fringe. Before joining the School of Informatics, he was a Staff Engineer with National Instruments Corp., where he contributed to five products in the areas of motion control, computer vision and dynamic simulation. This work resulted in seven US patents and numerous industry awards for product innovation.

From Bandits to Experts: A Tale of Domination and Independence

Speaker: Nicolò Cesa-Bianchi
Speaker Affiliation: Dipartimento di Informatica, Università degli Studi di Milano
Host: Lorenzo Rosasco, Carlo Ciliberto, Giulia Pasquale
Host Affiliation:DIBRIS, Universita' degli studi di Genova.

Date: 2015-06-24
Time: 9:00
Location: Room 506, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Prediction with expert advice is a general framework for studying sequential prediction problems formulated as repeated games between a player and an adversary. An instance of this framework is the nonstochastic multiarmed bandit, an abstract model for many scenarios typically found in the management of online services (such as recommender systems). Algoritms for experts and bandits are evaluated based on their regret, a notion of sequential risks analog to the statistical risk studied in machine learning. In this talk I will introduce the setting when the relationships between actions (i.e., the items to recommend to the user) define a graph. The best possible performance (minimax regret) will then be characterized in terms of natural combinatorial properties of this feedback graph.

Nonnegative matrix factorization and applications in audio signal processing

Speaker: Cédric Févotte
Speaker Affiliation: Laboratoire Lagrange - CNRS, Observatoire de la Côte d'Azur & Université de Nice Sophia Antipolis
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' degli studi di Genova.

Date: 2015-06-24
Time: 9:50
Location: Room 506, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Other the last 15 years nonnegative matrix factorization (NMF) has become a popular unsupervised dictionary learning/adaptive data decomposition technique with applications in many fields. In particular, much research about this topic has been driven by applications in audio, where NMF has been applied with success to automatic music transcription and single channel source source separation. In this setting the nonnegative data is formed by the magnitude or power spectrogram of the sound signal and is decomposed as the product of a dictionary matrix containing elementary spectra representative of the data times an activation matrix which contains the expansion coefficients of the data frames in the dictionary. The talk will provide a general overview of NMF with a focus on majorization-minimization (MM) algorithms and will present a bunch of audio applications.

Speaker:

Date:
Time:
Location:

The Invariance Hypothesis Implies Domain-Specific Modules in Visual Cortex

Speaker: Joel Z. Leibo
Speaker Affiliation: Google Deep Mind
Host: Lorenzo Rosasco, Carlo Ciliberto, Giulia Pasquale
Host Affiliation:DIBRIS, Universita' degli studi di Genova.

Date: 2015-06-24
Time: 11:50
Location: Room 506, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to new objects that share properties with the old, then the recognition system's optimal organization must be one containing specialized modules for different object classes. Our analysis starts from a premise we call the invariance hypothesis: that the computational goal of the ventral stream is to compute an invariant-to-transformations and discriminative signature for recognition. The key condition enabling approximate transfer of invariance without sacrificing discriminability turns out to be that the learned and novel objects transform similarly. This implies that the optimal recognition system must contain subsystems trained only with data from similarly-transforming objects and suggests a novel interpretation of domain-specific regions like the fusiform face area (FFA). Furthermore, we can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions in agreement with the available data. The result is a unifying account linking the large literature on view-based recognition with the wealth of experimental evidence concerning domain-specific regions.

Maximum Likelihood Estimation for Linear Gaussian Covariance Models

Speaker: Piotr W. Zwiernik
Speaker Affiliation: Department of Mathematics, Universita' degli studi di Genova.
Host: Lorenzo Rosasco
Host Affiliation:DIBRIS, Universita' di Genova; Laboratory for Computational and Statistical Learning, MIT-IIT

Date: 2015-05-14
Time: 2:00 pm
Location: Conference Room 363, DIBRIS Valletta Puggia. Via Dodecaneso 35, Genova, IT.

Abstract
We study parameter estimation in linear Gaussian covariance models, which are p-dimensional Gaussian models with linear constraints on the covariance matrix. Maximum likelihood estimation for this class of models leads to a non-convex optimization problem which typically has many local optima. We prove that the log-likelihood function is concave over a large region of the cone of positive definite matrices. Using recent results on the asymptotic distribution of extreme eigenvalues of the Wishart distribution, we provide sufficient conditions for any hill climbing method to converge to the global optimum. Remarkably, our numerical simulations indicate that our results remain valid for p as small as 2. An important consequence of this analysis is that for sample sizes n>14p, maximum likelihood estimation for linear Gaussian covariance models behaves as if it were a convex optimization problem.

Date

Speaker

Title

Location

Mar 21, 2017 Jalal Fadili Low Complexity Regularization of Inverse Problems: from Sensitivity Analysis to Algorithms Genova
Mar 14, 2017 Gergely Neu Optimistic exploration in non-stochastic bandit problems Genova
Mar 2, 2017 Jérôme Bolte From error bounds to the complexity of first-order descent methods for convex functions Genova
Feb 28, 2017 Aymeric Dieuleveut Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression Genova
Feb 21, 2017 Ingrid Blaschzyk Improved Classification Rates for Localized Algorithms under Margin Conditions Genova
Feb 15, 2017 Ilaria Giulini Data driven estimation of the Laplace-Beltrami operator Genova
Feb 13, 2017 Marco Zanetti Machine Learning and data-based Physics Genova
Jan 17, 2017 Gabriel Peyré From Monge-Kantorovich to Gromov-Wasserstein: Numerical Optimal Transport Between Several Metric Spaces Genova
Dec 13, 2016 Justin Eldridge Graphons, mergeons, and so on! Genova
May 12, 2016 Jean-Christophe Pesquet Convergence Analysis of a Stochastic Majorize-Minimize Subspace Algorithm Genova
Dec 4, 2015 Marco Cuturi Regularized Wasserstein Distances and Applications Cambridge - MIT
Nov 10, 2015 Leonardo Badino Speech Production Features for Deep Neural Network Acoustic Modeling Cambridge - MIT
Jul 30, 2015 Marco Cuturi New Approaches to Learn with Probability Measures using Fast Optimal Transport Genova
Jul 17, 2015 Mauro Maggioni Geometric Methods for the Approximation of High-dimensional Dynamical Systems Genova
Jun 25, 2015 Subramanian Ramamoorthy Learning to Shape Human-Robot Interactions: Models and Algorithms Genova
Jun 24, 2015 Nicolo Cesa-Bianchi From Bandits to Experts: A Tale of Domination and Independence Genova
Jun 24, 2015 Cédric Févotte Nonnegative matrix factorization and applications in audio signal processing Genova
Jun 24, 2015 Giorgio Metta TBA Genova
Jun 24, 2015 Joel Z. Leibo The Invariance Hypothesis Implies Domain-Specific Modules in Visual Cortex Genova
May 14, 2015 Piotr W. Zwiernik Maximum Likelihood Estimation for Linear Gaussian Covariance Models Genova

Showing 81-100 of 129 results