-
Adenine: A HPC-oriented tool for biological data exploration
Fiorini S., Tomasi F., Squillario M., Barla A.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract: adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github.com/slipguru/adenine under FreeBSD license.
-
Cancer mutational signatures identification with sparse dictionary learning
Tozzo V., Barla A.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract: Somatic DNA mutations are a characteristic of cancerous cells, being usually key in the origin and development of cancer. In the last few years, somatic mutations have been studied in order to understand which processes or conditions may generate them, with the purpose of developing prevention and treatment strategies. In this work we propose a novel sparse regularised method that aims at extracting mutational signatures from somatic mutations. We developed a pipeline that extracts the dataset from raw data and performs the analysis returning the signatures and their relative usage frequencies. A thorough comparison between our method and the state of the art procedure reveals that our pipeline can be used alternatively without losing information and possibly gaining more interpretability and precision.
-
Different features of tumor-associated NK cells in patients with low-grade or high-grade peritoneal carcinomatosis
Pesce S., Belgrano V., Greppi M., Carlomagno S., Squillario M., Barla A., Della Chiesa M., Di Domenico S., Mavilio D., Moretta L., Candiani S., Sivori S., De Cian F., Marcenaro E.
Frontiers in Immunology
Abstract: Peritoneal carcinomatosis (PC) is a rare disease defined as diffused implantation of neoplastic cells in the peritoneal cavity. This clinical picture occurs during the evolution of peritoneal tumors, and it is the main cause of morbidity and mortality of patients affected by these pathologies, though cytoreductive surgery with heated intra-peritoneal chemotherapy (CRS/HIPEC) is yielding promising results. In the present study, we evaluated whether the tumor microenvironment of low-grade and high-grade PC could affect the phenotypic and functional features and thus the anti-tumor potential of NK cells. We show that while in the peritoneal fluid (PF) of low-grade PC most CD56dim NK cells show a relatively immature phenotype (NKG2A+KIR–CD57–CD16dim), in the PF of high-grade PC NK cells are, in large majority, mature (CD56dimKIR+CD57+CD16bright). Furthermore, in low-grade PC, PF-NK cells are characterized by a sharp down-regulation of some activating receptors, primarily NKp30 and DNAM-1, while, in high-grade PC, PF-NK cells display a higher expression of the PD-1 inhibitory checkpoint. The compromised phenotype observed in low-grade PC patients corresponds to a functional impairment. On the other hand, in the high-grade PC patients PF-NK cells show much more important defects that only partially reflect the compromised phenotype detected. These data suggest that the PC microenvironment may contribute to tumor escape from immune surveillance by inducing different NK cell impaired features leading to altered anti-tumor activity. Notably, after CRS/HIPEC treatment, the altered NK cell phenotype of a patient with a low-grade disease and favorable prognosis was reverted to a normal one. Our present data offer a clue for the development of new immunotherapeutic strategies capable of restoring the NK-mediated anti-tumor responses in association with the CRS/HIPEC treatment to increase the effectiveness of the current therapy.
-
Discrete Changes in Glucose Metabolism Define Aging
Ravera S., Podestà M., Sabatini F., Dagnino M., Cilloni D., Fiorini S., Barla A., Frassoni F.
Scientific Reports
Abstract: Aging is a physiological process in which multifactorial processes determine a progressive decline. Several alterations contribute to the aging process, including telomere shortening, oxidative stress, deregulated autophagy and epigenetic modifications. In some cases, these alterations are so linked with the aging process that it is possible predict the age of a person on the basis of the modification of one specific pathway, as proposed by Horwath and his aging clock based on DNA methylation. Because the energy metabolism changes are involved in the aging process, in this work, we propose a new aging clock based on the modifications of glucose catabolism. The biochemical analyses were performed on mononuclear cells isolated from peripheral blood, obtained from a healthy population with an age between 5 and 106 years. In particular, we have evaluated the oxidative phosphorylation function and efficiency, the ATP/AMP ratio, the lactate dehydrogenase activity and the malondialdehyde content. Further, based on these biochemical markers, we developed a machine learning-based mathematical model able to predict the age of an individual with a mean absolute error of approximately 9.7 years. This mathematical model represents a new non-invasive tool to evaluate and define the age of individuals and could be used to evaluate the effects of drugs or other treatments on the early aging or the rejuvenation.
-
Icing: Large-scale inference of immunoglobulin clonotypes
Tomasi F., Squillario M., Verri A., Bagnara D., Barla A.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract: Immunoglobulin (IG) clonotype identification is a fundamental open question in modern immunology. An accurate description of the IG repertoire is crucial to understand the variety within the immune system of an individual, potentially shedding light on the pathogenetic process. Intrinsic IG heterogeneity makes clonotype inference an extremely challenging task, both from a computational and a biological point of view. Here we present icing, a framework that allows to reconstruct clonal families also in case of highly mutated sequences. icing has a modular structure, and it is designed to be used with large next generation sequencing (NGS) datasets, a technology which allows the characterisation of large-scale IG repertoires. We extensively validated the framework with clustering performance metrics on the results in a simulated case. icing is implemented in Python, and it is publicly available under FreeBSD licence at https://github.com/slipguru/icing.
-
Uncovering the genetic hereditable component of late onset Alzheimer’s disease through the analysis of GWA data
M Squillario, F Tomasi, V Tozzo, A Barla, D Uberti
Abstract: This is an open access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
-
Adenine: A HPC-Oriented Tool for Biological Data Exploration
A Barla
Abstract: Adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github. com/slipguru/adenine under FreeBSD license.
-
Next Generation Sequencing and Microrna Assay in a Cohort of Patients Affected By Myelodysplastic Syndromes. an Analysis of Clinical and Genotypic Features
D Avenoso, M Squillario, A Barla, A Verri, E Carminati, C Nurra, E Todisco, F Gigli, G Gregato, C Poletti, F Bertolini, G J Mufti, M Shah, M Miglino
Abstract: The myelodysplastic syndromes (MDS) are a group of clonal haematopoietic stem cell disorders and they can be a consequence of genomic/chromosomal instability. The World Health Organization (WHO) 2008 classification divides the MDS in 4 types: refractory anaemia with excess of blast (RAEB), refractory anaemia (RA), refractory cytopenia with multilineage dysplasia (RCMD) and chronic myelomonocytic leukaemia (CMML). Various mechanisms contribute to the pathogenesis and prognosis of the disease and currently next generation sequencing (NGS) detects pathogenic gene mutations that can allocate patients to different risk classes. MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate gene expression via post-transcriptional mechanisms and may have oncogenic properties or act as tumour suppressor and have an active role in the onset of myeloid …
-
Development of a smart post-hospitalization facility for older people by using domotics, robotics, and automated tele-monitoring
C Patrone, A Cella, C Martini, S Pericu, R Femia, A Barla, C Porfirione, M Puntoni, N Veronese, F Odone, N Casiddu, G A Rollandi, A Verri, A Pilotto
Abstract: Recent studies showed that about the 8% of beds are occupied by patients who experience a delayed hospital discharge (DHD). This is attributed to a delay in the arrangement of home-care assistance or in admission to long-term care facilities. Recently a lot of technologies have been developed to improve caring and monitoring of older people. The aim of this study is to design, implement and test a prototype of a technology based post-hospitalization facility for older people at risk of DHD by using domotics, robotics and wearable sensors for tele-monitoring. A sensorised posthospitalization facility has been built inside the hospital. Thirty-five healthy volunteers aged from 20 to 82 years were recruited. Clinical and functional assessment, ie motility index (MI), and human-robot interaction satisfaction were measured. A significant correlation was observed between automatic MI and the Gait Speed, the time sit-to-stand, and the Timed Up and Go test. Domotics, robotics and technology-based telemonitoring may represent a new way to assess patient’s autonomy and functional and clinical conditions in an ecological way, reproducing as much as possible a real life at home.
-
Predicting diabetes second-line therapy initiation in the Australian population via time span-guided neural attention network
Fiorini S., Hajati F., Barla A., Girosi F.
PLoS ONE
Abstract: Introduction The first line of treatment for people with Diabetes mellitus is metformin. However, over the course of the disease metformin may fail to achieve appropriate glycemic control, and a second-line therapy may become necessary. In this paper we introduce Tangle, a time span-guided neural attention model that can accurately and timely predict the upcoming need for a second-line diabetes therapy from administrative data in the Australian adult population. The method is suitable for designing automatic therapy review recommendations for patients and their providers without the need to collect clinical measures. Data We analyzed seven years of de-identified records (2008-2014) of the 10% publicly available linked sample of Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) electronic databases of Australia. Methods By design, Tangle inherits the representational power of pre-trained word embedding, such as GloVe, to encode sequences of claims with the related MBS codes. Moreover, the proposed attention mechanism natively exploits the information hidden in the time span between two successive claims (measured in number of days). We compared the proposed method against state-of-the-art sequence classification methods. Results Tangle outperforms state-of-the-art recurrent neural networks, including attention-based models. In particular, when the proposed time span-guided attention strategy is coupled with pre-trained embedding methods, the model performance reaches an Area Under the ROC Curve of 90%, an improvement of almost 10 percentage points over an attentionless recurrent architecture. Implementation Tangle is implemented in Python using Keras and it is hosted on GitHub at https://github.com/samuelefiorini/tangle.
-
Secondary somatic mutations in g-protein-related pathways and mutation signatures in Uveal melanoma
Piaggio F., Tozzo V., Bernardi C., Croce M., Puzone R., Viaggi S., Patrone S., Barla A., Coviello D., Jager M.J., Van Der Velden P.A., Zeschnigk M., Cangelosi D., Eva A., Pfeffer U., Amaro A.
Cancers
Abstract: Background: Uveal melanoma (UM), a rare cancer of the eye, is characterized by initiating mutations in the genes G-protein subunit alpha Q (GNAQ), G-protein subunit alpha 11 (GNA11), cysteinyl leukotriene receptor 2 (CYSLTR2), and phospholipase C beta 4 (PLCB4) and by metastasis-promoting mutations in the genes splicing factor 3B1 (SF3B1), serine and arginine rich splicing factor 2 (SRSF2), and BRCA1-associated protein 1 (BAP1). Here, we tested the hypothesis that additional mutations, though occurring in only a few cases (“secondary drivers”), might influence tumor development. Methods: We analyzed all the 4125 mutations detected in exome sequencing datasets, comprising a total of 139 Ums, and tested the enrichment of secondary drivers in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that also contained the initiating mutations. We searched for additional mutations in the putative secondary driver gene protein tyrosine kinase 2 beta (PTK2B) and we developed new mutational signatures that explain the mutational pattern observed in UM. Results: Secondary drivers were significantly enriched in KEGG pathways that also contained GNAQ and GNA11, such as the calcium-signaling pathway. Many of the secondary drivers were known cancer driver genes and were strongly associated with metastasis and survival. We identified additional mutations in PTK2B. Sparse dictionary learning allowed for the identification of mutational signatures specific for UM. Conclusions: A considerable part of rare mutations that occur in addition to known driver mutations are likely to affect tumor development and progression.
-
Visual computing methods for assessing the well-being of older people
Martini C., Odone F., Noceti N., Chessa M., Barla A., Verri A., Cella A., Pilotto A., Rollandi G.A.
Communications in Computer and Information Science
Abstract: With the increasing share of elderly population worldwide, the necessity of assistive technologies to support clinicians in monitoring their health conditions is becoming more and more relevant. Recent medical literature has proposed the notion of frail elderly, which rapidly became a key element of clinical practices for the estimation of well-being in aging population. The evaluation of frailty is commonly based on self reported outcomes and occasional physicians evaluations, leading to possibly biased results. In this work we propose a data driven method to automatically evaluate two of the main aspects contributing to the frailty estimation, i.e. the motility of the subject and his cognitive status. The first one is evaluated using visual computing tools, while the latter relies on a virtual reality based system. We provide an extensive experimental assessment performed on two sets of data acquired in a sensorised protected discharge facility located in a local hospital. Our results are in good agreement with the assessment manually performed by physicians, nicely showing the potential capability of our approach to complement current protocols of evaluation.
-
Convergence rates of Forward–Douglas–Rachford splitting method
Cesare Molinari, Jingwei Liang, Jalal Fadili
Journal of optimization theory and applications
Abstract: Over the past decades, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward–Douglas–Rachford splitting method and study both global and local convergence rates of this method. For the global rate, we establish a sublinear convergence rate in terms of a Bregman divergence suitably designed for the objective function. Moreover, when specializing to the Forward–Backward splitting, we prove a stronger convergence rate result for the objective function value. Then locally, based on the assumption that the non-smooth part of the optimization problem is partly smooth, we establish local linear convergence of the method. More precisely, we show that the sequence generated by Forward–Douglas–Rachford first (i) identifies a smooth manifold in a finite number of iteration and then (ii) enters a local linear convergence regime, which is for instance characterized in terms of the structure of the underlying active smooth manifold. To exemplify the usefulness of the obtained result, we consider several concrete numerical experiments arising from applicative fields including, for instance, signal/image processing, inverse problems and machine learning.
-
An improper estimator with optimal excess risk in misspecified density estimation and logistic regression
Jaouad Mourtada, Stéphane Gaïffas
arXiv preprint
Abstract: We introduce a procedure for predictive conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for supervised statistical learning. On standard examples, this bound scales as d/n with d the model dimension and n the sample size, and critically remains valid under model misspecification. Being an improper (out-of-model) procedure, SMP improves over within-model estimators such as the maximum likelihood estimator, whose excess risk degrades under misspecification. Compared to approaches reducing to the sequential problem, our bounds remove suboptimal logn factors, addressing an open problem from Grünwald and Kotlowski for the considered models, and can handle unbounded classes. For the Gaussian linear model, the predictions and risk bound of SMP are governed by leverage scores of covariates, nearly matching the optimal risk in the well-specified case without conditions on the noise variance or approximation error of the linear model. For logistic regression, SMP provides a non-Bayesian approach to calibration of probabilistic predictions relying on virtual samples, and can be computed by solving two logistic regressions. It achieves a non-asymptotic excess risk of O((d+B2R2)/n), where R bounds the norm of features and B that of the comparison parameter; by contrast, no within-model estimator can achieve better rate than min(BR/n−−√,deBR/n) in general. This provides a computationally more efficient alternative to Bayesian approaches, which require approximate posterior sampling, thereby partly answering a question by Foster et al. (2018).
-
Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices
Jaouad Mourtada
arXiv preprint
Abstract: The first part of this paper is devoted to the decision-theoretic analysis of random-design linear prediction. It is known that, under boundedness constraints on the response (and thus on regression coefficients), the minimax excess risk scales, up to constants, as σ2d/n in dimension d with n samples and noise σ2. Here, we study the expected excess risk with respect to the full linear class. We show that the ordinary least squares estimator is exactly minimax optimal in the well-specified case for every distribution of covariates. Further, we express the minimax risk in terms of the distribution of statistical leverage scores of individual samples. We deduce a precise minimax lower bound of σ2d/(n−d+1) for general covariate distribution, which nearly matches the risk for Gaussian design. We then obtain nonasymptotic upper bounds on the minimax risk for covariates that satisfy a "small ball"-type regularity condition, which scale as (1+o(1))σ2d/n as d=o(n), both in the well-specified and misspecified cases. Our main technical contribution is the study of the lower tail of the smallest singular value of empirical covariance matrices around 0. We establish a lower bound on this lower tail, valid for any distribution in dimension d≥2, together with a matching upper bound under a necessary regularity condition. Our proof relies on the PAC-Bayesian technique for controlling empirical processes, and extends an analysis of Oliveira devoted to a different part of the lower tail. Equivalently, our upper bound shows that the operator norm of the inverse sample covariance matrix has bounded Lq norm up to q≍n, and our lower bound implies that this exponent is unimprovable. Finally, we show that the regularity condition naturally holds for independent coordinates.
-
Accelerated iterative regularization via dual diagonal descent
L Calatroni, G Garrigos, L Rosasco, S Villa
arXiv preprint
Abstract: We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with general regularization and data-fit functions. In particular, we develop an inertial approach of which we analyze both convergence and stability. Using tools from inexact proximal calculus, we prove early stopping results with optimal convergence rates for additive data-fit terms as well as more general cases, such as the Kullback-Leibler divergence, for which different type of proximal point approximations hold.
-
A Weakly Supervised Strategy for Learning Object Detection on a Humanoid Robot
E Maiettini, G Pasquale, V Tikhanoff, L Rosasco, L Natale
2019 IEEE-RAS 19th International Conference on Humanoid Robots
Abstract: Research in Computer Vision and Deep Learning has recently proposed numerous effective techniques for detecting objects in an image. In general, these employ deep Convolutional Neural Networks trained end-to-end on large datasets annotated with object labels and 2D bounding boxes. These methods provide remarkable performance, but are particularly expensive in terms of training data and supervision. Hence, modern object detection algorithms are difficult to be deployed in robotic applications that require on-line learning. In this paper, we propose a weakly supervised strategy for training an object detector in this scenario. The main idea is to let the robot iteratively grow a training set by combining autonomously annotated examples, with others that are requested for human supervision. We evaluate our method on two experiments with data acquired from the iCub and R1 humanoid platforms, showing ...
-
On-line object detection: a robotics challenge
Maiettini, Elisa and Pasquale, Giulia and Rosasco, Lorenzo and Natale, Lorenzo
Autonomous Robots (2019)
Abstract: Object detection is a fundamental ability for robots interacting within an environment. While stunningly effective, state-of-the-art deep learning methods require huge amounts of labeled images and hours of training which does not favour such scenarios. This work presents a novel pipeline resulting from integrating (Maiettini et al. in 2017 IEEE-RAS 17th international conference on humanoid robotics (Humanoids), 2017) and (Maiettini et al. in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2018), which naturally trains a robot to detect novel objects in few seconds. Moreover, we report on an extended empirical evaluation of the learning method, justifying that the proposed hybrid architecture is key in leveraging powerful deep representations while maintaining fast training time of large scale Kernel methods. We validate our approach on the Pascal VOC benchmark (Everingham et al. in Int J Comput Vis 88(2): 303--338, 2010), and on a challenging robotic scenario (iCubWorld Transformations (Pasquale et al. in Rob Auton Syst 112:260--281, 2019). We address real world use-cases and show how to tune the method for different speed/accuracy trades-off. Lastly, we discuss limitations and directions for future development.
@inproceedings{maiettini2019,
title={On-line object detection: a robotics challenge},
author={Maiettini, Elisa and Pasquale, Giulia and Rosasco, Lorenzo and Natale, Lorenzo},
year={2019},
journal={Autonomous Robots}}
-
Exact sampling of determinantal point processes with sublinear time preprocessing
Derezinski, Michal and Calandriello, Daniele and and Valko, Michal
Advances in Neural Information Processing Systems
Abstract: We study the complexity of sampling from a distribution over all index subsets of the set {1, ..., n} with the probability of a subset S proportional to the determinant of the submatrix LS of some n x n positive semidefinite matrix L, where LS corresponds to the entries of L indexed by S. Known as a determinantal point process (DPP), this distribution is used in machine learning to induce diversity in subset selection. When sampling from DDPs, we often wish to sample multiple subsets S with small expected size k = E[|S|] << n from a very large matrix L, so it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). For this purpose we provide DPP-VFX, a new algorithm which, given access only to L, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is n poly(k), i.e., sublinear in the size of L, and (2) its sampling cost is poly(k), i.e., independent of the size of L. Prior to our results, state-of-the-art exact samplers required O(n^3) preprocessing time and sampling time linear in n or dependent on the spectral properties of L. We furthermore give a reduction which allows using our algorithm for exact sampling from cardinality constrained determinantal point processes with n poly(k) time preprocessing. Our implementation of DPP-VFX is provided at https://github.com/guilgautier/DPPy/.
@inproceedings{calandriello2019dpp,
title={Exact sampling of determinantal point processes with sublinear time preprocessing},
author={Derezinski, Michal and Calandriello, Daniele and and Valko, Michal},
year={2019},
booktitle={Advances in Neural Information Processing Systems}}
-
Convergence of Stochastic Proximal Gradient Algorithm
Lorenzo Rosasco, Silvia Villa, Bằng Công Vũ
Applied Mathematics and Optimization (2019)
Abstract: We study the extension of the proximal gradient algorithm where only a stochastic gradient estimate is available and a relaxation step is allowed. We establish convergence rates for function values in the convex case, as well as almost sure convergence and convergence rates for the iterates under further convexity assumptions. Our analysis avoid averaging the iterates and error summability assumptions which might not be satisfied in applications, e.g. in machine learning. Our proofing technique extends classical ideas from the analysis of deterministic proximal gradient algorithms.
@Article{Rosasco2019,
title={Convergence of Stochastic Proximal Gradient Algorithm},
author={Rosasco, Lorenzo and Villa, Silvia and Vu, Bang Cong},
journal={Applied Mathematics and Optimization},
year={2019},
month={Oct},
day={15},
issn={1432-0606},
doi={10.1007/s00245-019-09617-7},
url={https://doi.org/10.1007/s00245-019-09617-7}}
-
Learning to sequence multiple tasks with competing constraints
Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Yanlong Huang and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele
2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Abstract: Imitation learning offers a general framework where robots can efficiently acquire novel motor skills from demonstrations of a human teacher. While many promising achievements have been shown, the majority of them are only focused on single-stroke movements, without taking into account the problem of multi-tasks sequencing. Conceivably, sequencing different atomic tasks can further augment the robot's capabilities as well as avoid repetitive demonstrations. In this paper, we propose to address the issue of multi-tasks sequencing with emphasis on handling the so-called competing constraints, which emerge due to the existence of the concurrent constraints from Cartesian and joint trajectories. Specifically, we explore the null space of the robot from an information-theoretic perspective in order to maintain imitation fidelity during transition between consecutive tasks. The effectiveness of the proposed method is validated through simulated and real experiments on the iCub humanoid robot.
@inproceedings{duan2019,
title={Learning to sequence multiple tasks with competing constraints},
author={Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Yanlong Huang and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele},
booktitle={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages={},
year={2019},
organization={IEEE}}
-
Derivative-free online learning of inverse dynamics models
Romeres, Diego and Zorzi, Mattia and Camoriano, Raffaello and Traversaro, Silvio and Chiuso, Alessandro
IEEE Transactions on Control Systems Technology
Abstract: This paper discusses online algorithms for inverse dynamics modelling in robotics. Several model classes including rigid body dynamics (RBD) models, data-driven models and semiparametric models (which are a combination of the previous two classes) are placed in a common framework. While model classes used in the literature typically exploit joint velocities and accelerations, which need to be approximated resorting to numerical differentiation schemes, in this paper a new `derivative-free' framework is proposed that does not require this preprocessing step. An extensive experimental study with real data from the right arm of the iCub robot is presented, comparing different model classes and estimation procedures, showing that the proposed `derivative-free' methods outperform existing methodologies.
@article{romeres2019,
title={Derivative-free online learning of inverse dynamics models},
author={Romeres, Diego and Zorzi, Mattia and Camoriano, Raffaello and Traversaro, Silvio and Chiuso, Alessandro},
journal={IEEE Transactions on Control Systems Technology},
year={2019},
publisher={IEEE}}
-
Genuine Personality Recognition from Highly Constrained Face Images
Fabio Anselmi, Nicoletta Noceti, Lorenzo Rosasco, Robert Ward
International Conference on Image Analysis and Processing (ICIAP 2019)
Abstract: People are able to accurately estimate personality traits, merely on the basis of “passport”-style neutral faces and, thus, cues must exist that allow for such estimation. However, up to date, there has been little progress in identifying the form and location of these cues.
@INPROCEEDINGS{anselmi2019,
title = {Genuine Personality Recognition from Highly Constrained Face Images},
booktitle = {Image Analysis and Processing -- ICIAP 2019},
year = {2019},
publisher = {Springer International Publishing}, url = {https://link.springer.com/chapter/10.1007/978-3-030-30642-7_38},
author = {Fabio Anselmi, Nicoletta Noceti, Lorenzo Rosasco, Robert Ward},
-
Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis
Saverio Salzo, Silvia Villa
arXiv preprint
Abstract: We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak convergence of the iterates and the asymptotic rate o(1/n) for the mean of the function values. We derive linear rates under strong convexity and error bound conditions. Our analysis is based on an abstract convergence principle for stochastic descent algorithms which allows to extend and simplify existing results.
@INPROCEEDINGS{salzo2019,
title = {Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1906.07392},
author = {Saverio Salzo, Silvia Villa},
-
Fast approximation of orthogonal matrices and application to PCA
Cristian Rusu, Lorenzo Rosasco
arXiv preprint
Abstract: We study the problem of approximating orthogonal matrices so that their application is numerically fast and yet accurate. We find an approximation by solving an optimization problem over a set of structured matrices, that we call Givens transformations, including Givens rotations as a special case. We propose an efficient greedy algorithm to solve such a problem and show that it strikes a balance between approximation accuracy and speed of computation. The proposed approach is relevant in spectral methods and we illustrate its application to PCA.
@INPROCEEDINGS{rusu2019,
title = {Fast approximation of orthogonal matrices and application to PCA},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1907.08697},
author = {Cristian Rusu, Lorenzo Rosasco},
-
Gain with no Pain: Efficient Kernel-PCA by Nystrom Sampling
Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi
arXiv preprint
Abstract: In this paper, we propose and study a Nyström based approach to efficient large scale kernel principal component analysis (PCA). The latter is a natural nonlinear extension of classical PCA based on considering a nonlinear feature map or the corresponding kernel. Like other kernel approaches, kernel PCA enjoys good mathematical and statistical properties but, numerically, it scales poorly with the sample size. Our analysis shows that Nyström sampling greatly improves computational efficiency without incurring any loss of statistical accuracy. While similar effects have been observed in supervised learning, this is the first such result for PCA. Our theoretical findings, which are also illustrated by numerical results, are based on a combination of analytic and concentration of measure techniques. Our study is more broadly motivated by the question of understanding the interplay between statistical and computational requirements for learning.
@INPROCEEDINGS{sterge2019,
title = {Gain with no Pain: Efficient Kernel-PCA by Nystrom Sampling},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1907.05226},
author = {Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi},
-
Multi-Scale Vector Quantization with Reconstruction Trees
Enrico Cecini, Ernesto De Vito, Lorenzo Rosasco
arXiv preprint
Abstract: We propose and study a multi-scale approach to vector quantization. We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard vector quantization methods, such as K-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse to fine--multi-scale--fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian sub-manifold. Tools from differential geometry and concentration of measure are useful in our analysis.
@INPROCEEDINGS{Cecini2019,
title = {Multi-Scale Vector Quantization with Reconstruction Trees},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1907.03875},
author = {Enrico Cecini, Ernesto De Vito, Lorenzo Rosasco},
-
Implicit Regularization of Accelerated Methods in Hilbert Spaces
Nicolò Pagliana, Lorenzo Rosasco
Advances in Neural Information Processing Systems 33 (NeurIPS 2019)
Abstract: We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces. We analyze the implicit regularization properties of Nesterov acceleration and a variant of heavy-ball in terms of corresponding learning error bounds. Our results show that acceleration can provides faster bias decay than gradient descent, but also suffers of a more unstable behavior. As a result acceleration cannot be in general expected to improve learning accuracy with respect to gradient descent, but rather to achieve the same accuracy with reduced computations. Our theoretical results are validated by numerical simulations. Our analysis is based on studying suitable polynomials induced by the accelerated dynamics and combining spectral techniques with concentration inequalities.
@INPROCEEDINGS{pagliana2019,
title = {Implicit Regularization of Accelerated Methods in Hilbert Spaces},
journal = {to appear in “Advances in Neural Information Processing Systems 33 (NeurIPS 2019)},
year = {2019},
url = {https://arxiv.org/abs/1905.13000},
author = {Nicolò Pagliana, Lorenzo Rosasco},
-
Reproducing kernel Hilbert spaces on manifolds: Sobolev and Diffusion spaces
Ernesto De Vito, Nicole Mücke, Lorenzo Rosasco
arXiv preprint
Abstract: We study reproducing kernel Hilbert spaces (RKHS) on a Riemannian manifold. In particular, we discuss under which condition Sobolev spaces are RKHS and characterize their reproducing kernels. Further, we introduce and discuss a class of smoother RKHS that we call diffusion spaces. We illustrate the general results with a number of detailed examples.
@INPROCEEDINGS{devito2019,
title = {Reproducing kernel Hilbert spaces on manifolds: Sobolev and Diffusion spaces},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1905.10913},
author = {Ernesto De Vito, Nicole Mücke, Lorenzo Rosasco},
-
Monte Carlo wavelets: a randomized approach to frame discretization
Zeljko Kereta, Stefano Vigogna, Valeriya Naumova, Lorenzo Rosasco, Ernesto De Vito
13th International Conference on Sampling Theory and Applications
Abstract: In this paper we propose and study a family of continuous wavelets on general domains, and a corresponding stochastic discretization that we call Monte Carlo wavelets. First, using tools from the theory of reproducing kernel Hilbert spaces and associated integral operators, we define a family of continuous wavelets using spectral calculus. Then, we propose a stochastic discretization based on Monte Carlo estimates of the integral operators. Using concentration of measure results, we establish the convergence of such a discretization and derive convergence rates under natural regularity assumptions.
@INPROCEEDINGS{Kereta2019,
title = {Monte Carlo wavelets: a randomized approach to frame discretization},
journal = {13th International Conference on Sampling Theory and Applications},
year = {2019},
url = {https://arxiv.org/abs/1903.06594},
author = {Zeljko Kereta, Stefano Vigogna, Valeriya Naumova, Lorenzo Rosasco, Ernesto De Vito},
-
Gaussian process optimization with adaptive sketching: Scalable and no regret
Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco
Conference on Learning Theory (COLT 2019)
Abstract: Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions d and iterations t. Given a set of A alternatives to choose from, the overall runtime O(t^3A) is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most Õ (d_eff) points, where d_eff is the effective dimension of the explored space, which is typically much smaller than both d and t. This greatly reduces the dimensionality of the problem, thus leading to a O(TAd_eff^2) runtime and O(Ad_eff) space complexity.
@INPROCEEDINGS{Calandriello2019,
title = {Gaussian process optimization with adaptive sketching: Scalable and no regret},
journal = {COLT 2019},
year = {2019},
url = {https://arxiv.org/abs/1903.05594},
author = {Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco},
-
Theory III: Dynamics and Generalization in Deep Networks
Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio
arXiv preprint
Abstract: Classical generalization bounds for classification in the setting of separable data can be optimized by maximizing the margin of a deep network under the constraint of unit p-norm of the weight matrix at each layer. A possible approach for solving numerically this problem uses gradient algorithms on exponential-type loss functions, enforcing a unit constraint in the p-norm. In the limiting case of continuous gradient flow, we analyze the dynamical systems associated with three algorithms of this kind and their close relation for p=2 with existing weight normalization and batch normalization algorithms. An interesting observation is that unconstrained gradient descent has a similar dynamics with the same critical points and thus also optimizes the generalization bounds. Our approach extends some of the results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. This elusive complexity control is likely to be responsible for generalization despite overparametrization in deep networks.
@INPROCEEDINGS{Banburski2019,
title = {Theory III: Dynamics and Generalization in Deep Networks},
journal = {to appear on COLT 2019},
year = {2019},
url = {https://arxiv.org/abs/1903.04991},
author = {Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio},
-
Beating SGD Saturation with Tail-Averaging and Minibatching
Nicole Mücke, Gergely Neu, Lorenzo Rosasco
Advances in Neural Information Processing Systems 33
Abstract: While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.
@INPROCEEDINGS{mucke2019,
title = {Beating SGD Saturation with Tail-Averaging and Minibatching},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1902.08668},
author = {Nicole Mücke, Gergely Neu, Lorenzo Rosasco},
-
A computational model for grid maps in neural populations
Fabio Anselmi, Benedetta Franceschiello, Micah M Murray, Lorenzo Rosasco
arXiv preprint
Abstract: Grid cells in the entorhinal cortex, together with place, speed and border cells, are major contributors to the organization of spatial representations in the brain. In this contribution we introduce a novel theoretical and algorithmic framework able to explain the emergence of hexagonal grid-like response patterns from the statistics of the input stimuli. We show that this pattern is a result of minimal variance encoding of neurons. The novelty lies into the formulation of the encoding problem through the modern Frame Theory language, specifically that of equiangular Frames, providing new insights about the optimality of hexagonal grid receptive fields. The model proposed overcomes some crucial limitations of the current attractor and oscillatory models. It is based on the well-accepted and tested hypothesis of Hebbian learning, providing a simplified cortical-based framework that does not require the presence of theta velocity-driven oscillations (oscillatory model) or translational symmetries in the synaptic connections (attractor model). We moreover demonstrate that the proposed encoding mechanism naturally maps shifts, rotations and scaling of the stimuli onto the shape of grid cells' receptive fields, giving a straightforward explanation of the experimental evidence of grid cells remapping under transformations of environmental cues.
@INPROCEEDINGS{anselmi2019,
title = {A computational model for grid maps in neural populations},
journal = {arXiv preprint},
year = {2019},
url = {https://arxiv.org/abs/1902.06553},
author = {Fabio Anselmi, Benedetta Franceschiello, Micah M Murray, Lorenzo Rosasco},
-
Are we done with object recognition? The iCub robot’s perspective
Giulia Pasquale, Carlo Ciliberto, Francesca Odone, Lorenzo Rosasco, Lorenzo Natale
Robotics and Autonomous Systems 112
Abstract: We report on an extensive study of the benefits and limitations of current deep learning approaches to object recognition in robot vision scenarios, introducing a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a natural human–robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Analyzing the performance of off-the-shelf models trained off-line on large-scale image retrieval datasets, we show the necessity for knowledge transfer. We evaluate different ways in which this last step can be done, and identify the major bottlenecks affecting robotic scenarios. By studying both object categorization and identification problems, we highlight key differences between object recognition in robotics applications and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm the remarkable improvements yield by deep learning in this setting, while pointing to specific open challenges that need be addressed for seamless deployment in robotics.
@article{PASQUALE2019260,
title = {Are we done with object recognition? The iCub robot’s perspective},
journal = {Robotics and Autonomous Systems},
volume = {112},
pages = {260 - 281},
year = {2019},
issn = {0921-8890},
doi = {https://doi.org/10.1016/j.robot.2018.11.001},
url = {http://www.sciencedirect.com/science/article/pii/S0921889018300332},
author = {Giulia Pasquale and Carlo Ciliberto and Francesca Odone and Lorenzo Rosasco and Lorenzo Natale},
keywords = {Humanoid robotics, Robot vision, Visual object recognition, Machine learning, Deep learning, Transfer learning, Image dataset, Dataset collection, Representation invariance, iCub},
abstract = {We report on an extensive study of the benefits and limitations of current deep learning approaches to object recognition in robot vision scenarios, introducing a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a natural human–robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Analyzing the performance of off-the-shelf models trained off-line on large-scale image retrieval datasets, we show the necessity for knowledge transfer. We evaluate different ways in which this last step can be done, and identify the major bottlenecks affecting robotic scenarios. By studying both object categorization and identification problems, we highlight key differences between object recognition in robotics applications and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm the remarkable improvements yield by deep learning in this setting, while pointing to specific open challenges that need be addressed for seamless deployment in robotics},
-
Symmetry-adapted representation learning
Fabio Anselmi, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio
Pattern Recognition 86
Abstract: In this paper, we propose the use of data symmetries, in the sense of equivalences under signal transformations, as priors for learning symmetry-adapted data representations, i.e., representations that are equivariant to these transformations. We rely on a group-theoretic definition of equivariance and provide conditions for enforcing a learned representation, for example the weights in a neural network layer or the atoms in a dictionary, to have the structure of a group and specifically the group structure in the distribution of the input. By reducing the analysis of generic group symmetries to permutation symmetries, we devise a regularization scheme for representation learning algorithm, using an unlabeled training set. The proposed regularization is aimed to be a conceptual, theoretical and computational proof of concept for symmetry-adapted representation learning, where the learned data representations are equivariant or invariant to transformations, without explicit knowledge of the underlying symmetries in the data.
@article{ANSELMI2019201,
title = {Symmetry-adapted representation learning},
journal = {Pattern Recognition},
year = {2019},
url = {http://www.sciencedirect.com/science/article/pii/S0031320318302620},
abstract = {In this paper, we propose the use of data symmetries, in the sense of equivalences under signal transformations, as priors for learning symmetry-adapted data representations, i.e., representations that are equivariant to these transformations. We rely on a group-theoretic definition of equivariance and provide conditions for enforcing a learned representation, for example the weights in a neural network layer or the atoms in a dictionary, to have the structure of a group and specifically the group structure in the distribution of the input. By reducing the analysis of generic group symmetries to permutation symmetries, we devise a regularization scheme for representation learning algorithm, using an unlabeled training set. The proposed regularization is aimed to be a conceptual, theoretical and computational proof of concept for symmetry-adapted representation learning, where the learned data representations are equivariant or invariant to transformations, without explicit knowledge of the underlying symmetries in the data.},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2018.07.025},
author = {Fabio Anselmi and Georgios Evangelopoulos and Lorenzo Rosasco and Tomaso Poggio},
-
Thresholding gradient methods in Hilbert spaces: support identification and linear convergence
Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa
ESAIM: Control, Optimisation and Calculus of Variations 26, 28
Abstract: We study l1 regularized least squares optimization problem in a separable Hilbert space. We show that the iterative soft-thresholding algorithm (ISTA) converges linearly, without making any assumption on the linear operator into play or on the problem. The result is obtained combining two key concepts: the notion of extended support, a finite set containing the support, and the notion of conditioning over finite dimensional sets. We prove that ISTA identifies the solution extended support after a finite number of iterations, and we derive linear convergence from the conditioning property, which is always satisfied for l1 regularized least squares problems. Our analysis extends to the the entire class of thresholding gradient algorithms, for which we provide a conceptually new proof of strong convergence, as well as convergence rates.
@article{Garrigos17,
author = {Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa},
title = {Thresholding gradient methods in Hilbert spaces: support identification and linear convergence},
journal = {to appear on ESAIM: Control, Optimisation and Calculus of Variations},
year = {2019},
url = {https://arxiv.org/abs/1712.00357}
}