Nowadays, large and rapidly evolving data sets are commonly encountered in many computer science applications. Efficiently mining and exploiting these data sets generally results in the extraction of valuable information and therefore appears as an important challenge in various domains including network security, computer vision, internet search engines, bioinformatics, marketing systems, online advertisement, social networks, just to name a few.
The rapid development of these modern computer science applications sustains an ever-increasing demand for efficient machine learning algorithms that can cope with large-scale problems, characterized by a large number of samples and a large number of variables.
The research reported in the present thesis is devoted to the design of efficient machine learning algorithms for large-scale problems. Specifically, we adopt a geometric optimization viewpoint to address the problem of linear regression in nonlinear and high-dimensional matrix search spaces. Our purpose is to efficiently exploit the geometric structure of the search space in the design of scalable linear regression algorithms.
Our search space of main interest will be the set of low-rank matrices. Learning a low-rank matrix is a typical approach to cope with high-dimensional problems. The low-rank constraint is expected to force the learning algorithm to capture a limited number of dominant factors that mostly influence the sought solution. We consider both the learning of a fixed-rank symmetric positive semidefinite matrix and of a fixed-rank non-symmetric matrix.
A first contribution of the thesis is to show that many modern machine learning problems can be formulated as linear regression problems on the set of fixed-rank matrices. For example, the learning of a low-rank distance, low-rank matrix completion and the learning on data pairs are cast into the considered linear regression framework. For these problems, the low-rank constraint is either part of the original problem formulation or is a sound approximation that significantly reduces the original problem size, resulting in a dramatic decrease in the computational complexity of algorithms.
Our main contribution is the development of novel efficient algorithms for learning a linear regression model parameterized by a fixed-rank matrix. The resulting algorithms preserve the underlying geometric structure of the problem, scale to high-dimensional problems, enjoy local convergence properties and confer a geometric basis to recent contributions on learning fixed-rank matrices. We thereby show that the considered geometric optimization framework offers a solid and versatile framework for the design of rank-constrained machine learning algorithms.
The efficiency of the proposed algorithms is illustrated on several machine learning applications. Numerical experiments suggest that the proposed algorithms compete favorably with the state-of-the-art in terms of achieved performance and required computational time.